Hydrocarbon production from shale formations has become an essential part of the global energy supply in the past decade. The life of a project in an unconventional play significantly depends on the prediction of estimated ultimate recovery (EUR). However, the conventional method to predict EUR becomes less accurate for shale formations, which significantly affects the economic returns of projects in unconventional plays. The objective of this paper is to investigate the most important independent variables, including petrophysics and completion parameters, to estimate ultimate recovery through a machine-learning algorithm. A novel machine-learning model based on random forest regression is introduced to predict EUR and to rank the importance of the independent variables.
In this paper, production/petrophysics/engineering data with more than 25 variables from 4,000 wells in Eagle Ford is summarized for analysis. The data is collected from production monitoring, well logging, well testing, seismic interpretation, and laboratory experiments. This paper has three major components. First, a multivariate linear regression model is created to predict the overall EUR. Second, spatial autocorrelation analysis is conducted to identify whether spatial variables could affect the accuracy of the multivariate regression model. Third, the random forest regression models are trained to examine their reliability in predicting EUR with spatially autocorrelated data. The importance of key predictors is also identified. The final models are tuned with optimized hyperparameters. Through the paper, the predictive capabilities of each random forest regression model are discussed in detail to understand the physics behind unconventional hydrocarbon production mechanisms.
The results and work flow presented in this paper are insightful and novel. First, the authors test the multivariate regression analysis with all the petrophysics and completion variables using the backward elimination method. This widely used model has a limitation of excluding the spatial information. In order to identify the effect of the spatial variable, the author’s calculate the Moran’s index and find out that the data in this study is clustered or spatially autocorrelated. The p-value for overall EUR, oil EUR, and gas EUR are 0.000002, 0.000000, and 0.12, respectively, which all reject the null hypothesis that the data is randomly distributed. To include the spatial information in the prediction, the authors use advanced machine-learning technology, random forest, to predict the EUR with a combination of petrophysics, completion variables, and spatial information. The key variables to predict overall EUR, oil EUR, and gas EUR by random forest regression are identified. However, the importance of the key variables to predict oil EUR and gas EUR are different. Therefore, the authors split the overall EUR random forest regression model (57% explained) into two prediction models, one for oil EUR prediction and one for gas EUR prediction. The gas EUR random forest regression model has better performance (76% explained) compared with the oil EUR random forest regression model (60% explained).
This study provides a deeper understanding of unconventional hydrocarbon production prediction from a big-data perspective and proposes a novel and reliable machine-learning model to predict EUR to evaluate economic returns in the Eagle Ford. Compared with the traditional multivariate regression model, these random forest regression models are more reliable. In addition, the random forest technique is able to rank the importance of the relevant independent variables, and the rank of importance can be applied to guide and to improve data collection and model training for further study on this topic. The work flow presented in this article can be also used to train data for other unconventional resource plays.