During the past decade, the oil and gas industry, both onshore and offshore, has and continues to transition the way subsurface data is used to develop and constrain predictive models. The foundation of any advanced analytical effort applied in any industry is the quality, accessibility, and coverage the data set can provide to the user trying to train predictive modeling. Whether the approach is hierarchical clustering, random forest, multidimensional engines, or more-advanced multilinear regression analysis, it must begin with more knowns than the user is trying to define in the unknowns or predictions.
RFDbase Functions
GeoMark’s RFDbase is a volume of information and raw data from every major petroleum basin in the world that can be used to fuel a search engine or training engine to drive predictions (Fig. 1). It does so by linking the mechanisms representing geology, geochemistry, petrophysics, and engineering components/inputs to data-science efforts. Covering the needed end-member variables for both rock and fluid properties, the RFDbase contains laboratory-based data that can be used in analysis attempting to link subsurface characteristics to wellhead performance. The geoscience inputs can be constrained while the user constrains the production information (e.g., lateral length and completion size) to train the model.
Examples
Taking a simplistic approach to how RFDbase access can aid efforts, the schematic in Fig. 2 can be used to reference components of a predictive model to fuel, drive, and then upgrade a motorized vehicle.
Ultimately, the effort is trying to associate known properties to wellhead performance given several factors. There may be multiple x, y, and z dimension values that correlate; however, making sure that the data in context has an important role in guiding the developed model is critical. Otherwise, one might find that blueberries and peanut butter can be correlated from a certain function/regression or engine, but the question needs to be asked, “should they be?” Fig. 3 presents a correlation in the data that could be fit a minimum of three different ways.
Which is the most accurate? Which offers the best approach to predicting the minimum, maximum, mode, distribution, and average of the populations? These correlations can find more robust ways of applying all the statistical solutions to the best-fit approach through enhanced machine learning and data science.
Data Types
GeoMark’s RFDbase contains the following critical pieces of information. Each data type is followed by a list of possible ways the data can be used and what input or property to constrain in the modeling efforts.
- Source rock total organic carbon and programmed pyrolysis
- Source rock presence and enrichment (quality), working petroleum system, a qualitative proxy for the presence of volatile fluids, thermal maximum/maturity, and fluid mobility
- Source rock presence and enrichment (quality), working petroleum system, a qualitative proxy for the presence of volatile fluids, thermal maximum/maturity, and fluid mobility
- Fluid properties
- Includes oil, gas, and water
- API, estimated gas/oil ratio, water salinity and composition (midstream or facilities applications), gas compositions, maturity, charge and migration, hydrocarbon quality, and fluid mobility
- Rock/petrophysical properties
- Includes Fourier-transform infrared spectroscopy, X-ray fluorescence, X-ray diffraction, matrix density, bulk density, and total porosity/storage
- How much fluid-filled storage exists, lithology/mineralogical change, geomechanical changes related to brittleness from lithology understanding, chemostratigraphy (upwelling environments, and anoxic vs. dysoxic settings)
- Pressure/volume/temperature
- Specific gravity of fluids, gas/oil ratio prediction, formation volume factor constraint, fluid-phase envelope constraint (early gas breakthrough potential, phase change predictions, and drawdown behavior changes