Integration of Data-Driven and Physics-Based Models To Better Understand Subsurface Formations

This article briefly reviews two examples of how the integration of data-driven and physics-based models can optimize the history-matching process in reservoir simulation and rate of penetration in drilling operations, leading to more robust and efficient outcomes.


This article briefly reviews two examples of how the integration of data-driven and physics-based models can optimize the history-matching process in reservoir simulation and rate of penetration in drilling operations, leading to more robust and efficient outcomes.

The world has entered an era of data-driven technologies with the advent of smartphones. These technologies with a fast-paced trend have significantly changed many aspects of life, extending from the operational manner of the e-commerce industry to the way online job interviews are vetted.

In the past few years, the petroleum industry has increasingly adopted data-driven technologies to generate insights for driving better business outcomes. According to a 2018 McKinsey article (Anders and Zharkeshov 2018), with a stable backbone of streamlined operations in place, there are a number of organizations that can benefit from the enhanced productivity that digital- and analytics-driven approaches created. Yet, due to the black-box nature of some data-driven algorithms, industry experts tend to be careful with them. Engineers are prone to know how different elements of these tool boxes work before whole-heartedly welcoming this approach. Processes which are not purely data-driven and are based on established physical laws foster more confidence from the industry.

One of the challenges that the industry faces is the inherent high level of uncertainty. The modeling process for predicting the response of subsurface reservoirs is significantly complex. One major reason is that subsurface reservoirs are not completely visible to the naked eye; the models only approximate the actual subsurface reservoirs which may be different from reality. Here, we briefly review two examples for optimization of subsurface operations. 

Reservoir Simulation: History Matching

Subsurface reservoir models are created to predict and forecast hydrocarbon production and reservoir behavior. The models are built by incorporating parameters acquired from the reservoir through techniques such as seismic surveys, well logging, and laboratory core experiments. Data sets such as reservoir tops, porosity, permeability, fluid characteristics, and hydrocarbon saturation are estimated with varying level of uncertainties using these techniques. Commercial simulators use reservoir models that are governed by equations that capture physical phenomena such as mass balance and conservation of momentum. To have confidence in the prediction capability of these reservoir models, simulated results from models need to be compared with historic production data of the reservoirs. The parameters used in the models need to be adjusted to match the simulation output to reservoir historic production data—a process referred to as history matching (Li, Bhark, Gross, et al. 2019).

Traditionally, the reservoir-static model is created with the help of geologists, geophysicists, and petrophysicists; it is then is passed to reservoir engineers to perform simulation studies. History matching is a nonunique and inverse problem. This means that the solution is known, however, a combination of different parameters can lead to the same solution. Moreover, the data acquired from reservoirs is sparse, temporal, and acquired through indirect measurements. Thus, input parameters to a reservoir model contain a relatively wide range of uncertainties. This presents a challenge since there can be millions of parameters in a three-dimensional simulation model, affecting the simulation results. Moreover, exploring all permutations of input data can be computationally impractical. Engineers tend to use their experiences and skills to come to a history-matched reservoir model, which is perceived to be suitable for forecasting the reservoir behavior. It is often a manual trial-and-error process, involving tweaking a subset of available parameters, which can be delicate, time-consuming, and exhaustive. The process can be a nightmare sometimes, owing to the size and complexity of the reservoir and the nonuniqueness of the problem (Sætrom 2019).

Data-driven trends have enabled the industry to efficiently take advantage of data and algorithms to drive the process of history matching. The goal in the process of history matching is to minimize the errors between simulated and observed field data. Several history-matching methods are being implemented which use optimization, linear algebra, Monte Carlo sampling, and Bayesian inference to make the process robust, efficient, and, less exhausting. Increasingly, machine-learning algorithms such as sparse regression, principal component analysis, and metric learning are being implemented to solve the problem. Machine learning is helpful in pattern identification. It is being used to find correlations between reservoir parameters and model simulated response for the history-matching process. These correlations help to modify the input-model parameters accordingly and help to achieve an acceptable history-matched model.

The process of reservoir simulation presents an interesting implementation of data-driven and physics-based modeling. Reservoir physics dictates how the model responds to the variables, whereas the updated variables are governed by data-driven algorithms.

Drilling Parameters: ROP

The rate of penetration (ROP) is an important drilling parameter that describes the quality of the drilling process; a higher ROP indicates faster drilling which translates to increased rig productivity and performance. Additionally, combined with other parameters, ROP can be an indicator of kick, over- or under-pressured conditions, and stick-slip situation during drilling operations. Consequently, prediction of ROP has been important for drilling engineers. Several physics-based models such as the Bingham model, Motahhari model, and Hareland model are presented in the literature to predict ROP (Ardiansyah and Saad 2020). Determination of the input parameters to such models is crucial to accurately predict ROP.

ROP = αRPM \((WOB/D_b )^b\)              


Taking an example of the Bingham model (1965), the input parameters include RPM (rotary speed), WOB (weight on bit), Db (bit diameter), and empirical constants α and b, as shown in the equation above. The empirical constants quantify the ease of drilling through a particular formation and are either determined from offset drilling datapoints or calibrated for the particular rock sample. These empirical constants must be determined before these physical models can be used. Lithology of the reservoir changes from area to area, resulting in a wide range of uncertainties as to the values of these constants (Hegde, Daigle, Millwater, et al. 2017). Therefore, the calculation of ROP may suffer from these inaccuracies.

Determination of these constants through real-time drilling data can improve the ROP calculation. The industry uses data-driven algorithms to mitigate this challenge. Data-driven models use parameters similar to physics-based models such as WOB and RPM to better predict ROP, and additionally they use measured data acquired during the drilling operations. These models are usually initialized with offset well data. Such models are built as a function of feature vectors to predict ROP. Feature vectors can be RPM, WOB, and flowrate of drilling mud which are datasets collected from the drill floor and can be changed for ROP optimization. The linear-regression model is used to model ROP as a linear function of feature vectors. This method is easy and simple for linear data. In the case of nonlinear data, methods such as Random Forest algorithms can be implemented. This algorithm predicts ROP by fitting and averaging multiple decision trees (based on feature vectors) on the data (Hegde, Daigle, Millwater, et al. 2017) .

Calculation of the drilling parameter ROP presents an efficient way to determine elements within physics-based models (empirical constants) using data-driven models to increase the accuracy of results.

Key Takeaways

  • Data-driven approaches combined with physics-based models can significantly draw the attention of experts in the oil and gas industry.
  • Machine-learning algorithms can be used for finding correlations between reservoir parameters and simulated results for the history-matching process. 
  • Linear and nonlinear algorithms can be implemented to model and optimize ROP based on RPM, WOB, and the flow rate of drilling mud.

Abdul Saboor Khan is a reservoir engineer at Resoptima where he is involved with ensemble-based uncertainty-centric modeling of petroleum reservoirs. He holds a bachelor’s degree in petroleum engineering from Texas A&M University and a master’s degree with a specialization in reservoir engineering from NTNU.

The author acknowledges Muhammad Umer Azam’s contribution to the article. Azam is a design and project engineer with TechnipFMC where he is involved with engineering and design of products related to workover risers. He holds a master’s degree in petroleum technology, specializing in drilling engineering from NTNU.


Anders, B. and Zharkeshov, S. 2018. A New Operating Model for Well Organizations. McKinsey & Company, October 1.

Hegde, C., Daigle, H., Millwater, H., and Gray, K. 2017. "Analysis of rate of penetration (ROP) prediction in drilling using physics-based and data-driven models." Journal of Petroleum Science and Engineering 159: 295-306.

Li, B., Bhark, E.W., Gross, S.J., Billiter, T.C., and Dehghani, K. 2019. "Best practices of assisted history matching using design of experiments." SPE Journal.

Ardiansyah, N. and Saad, B. 2020. "Combining Insight from Physics-Based Models into Data-Driven Model for Predicting Drilling Rate of Penetration." In International Petroleum Technology Conference 2020.

Sætrom, J. 2019. “The Seven Wastes in Reservoir Modelling Projects (and How to Overcome Them).” Resoptima, 12 September.