Artificial intelligence (AI) and machine learning (ML) can expect a flourishing future as the new generation of engineers and scientists are exposed to, and start using, this technology in their everyday life. The solutions to clarify and distinguish the application of this technology to physics-based disciplines and to demonstrate the useful and game-changing applications of AI and ML in engineering and industrial applications is to develop a new generation of engineers and scientists who are well versed in the application of this technology. In other words, the objective should be to train and develop engineers who understand and are capable of efficiently applying data-driven analytics to engineering problem-solving.
Engineering and nonengineering problems have differences. To gain the expertise to address and solve engineering-related problems, humans attend universities and achieve engineering degrees. For example, to address reservoir-engineering-related problems that require a fundamental understanding of fluid flow in porous media, having reservoir engineers on your team is important because they are highly capable of solving such problems. To solve many nonengineering-related problems that only require general, human-level intelligence, no domain expertise or deep understanding of physics, chemistry, mathematics, or mechanics are required. For example, a 5-year-old can tell the difference between a dog and a cat because it only requires general, human-level intelligence. Of course, this does not mean that general, human-level intelligence is easy to mimic using computers. That task is actually quite challenging. Nevertheless, major differences persist between engineering- and nonengineering-related problems. This fact results in major differences between engineering and nonengineering applications of AI and ML.
Traditional engineering problem-solving has a well-known approach. It usually follows the laws of conservation of mass, energy, matter, or momentum. To build a physics-based model, an engineer observes the physical phenomenon that needs to be modeled in order to identify the variables (parameters) involved in its behavior. Then, the engineer uses mathematical equations to identify the relationship between all the involved variables. Finally, to address the solution of the engineering-related problem, the engineers will solve the mathematical equations that were constructed using an analytical or numerical approach, depending on the complexity of the mathematical equations.
To explain the Engineering applications of AI and ML, we need to address two questions:
- Can the human brain solve engineering-related problems? Examples of such problems can be catching a ball that is thrown to you or walking through the streets of Manhattan (Fig. 1).
- When solving (handling) the engineering-related problems, does the human brain build mathematical equations and then find solutions for them?
I hope that the answers to these questions are obvious. AI and ML mimic the human brain in order to solve engineering-related problems. Because the human brain does not use the traditional engineering problem-solving approach, the correct application of AI and ML to solve engineering-related problems relies on observations (data), not equations. The human brain learns through observation and repetition. It deals with imprecise and approximate (nonexact) data. It does not use binary (two-valued) Aristotelian logic.
The Future of Engineering
The industrial revolution simulated human muscles. The industrial revolution required about two centuries to completely control and change the evolution of societies. The currently known bio-tech and AI revolution simulates human brains. This revolution requires not centuries but only decades to change our societies. AI and ML already has started to change so much. Engineering-domain experts who become highly skilled AL and ML practitioners are the ones who will control the future of engineering disciplines. Becoming an engineering-related AI and ML expert practitioner requires extensive experience using AI and ML to solve engineering-related problems. This will not happen in a short period, similar to the fact that becoming an expert reservoir engineer requires much more than taking the Reservoir Engineering 101 course at the university.
Unfortunately, many engineers recently have started calling themselves “data scientists” right after reading a book or a few papers, watching a few Youtube videos, or listening to a few lectures. The reason behind such simplifications seems to be based on two facts:
- Engineers are taught that, in order to handle the physics of a phenomenon, they need to understand the mathematics behind modeling of the physics.
- The mathematics behind many AI and ML algorithms are very simple.
Therefore, the reason many engineers start calling themselves data scientists so quickly seems to be that, once they learn the mathematics behind the AI and ML algorithms, they come to conclusions that they know all that needs to be known as far as AI and ML are concerned. This is wrong. Such misunderstanding of AI and ML seems to be the main reason behind the recent development of the so-called “hybrid models.”
The application of AI and ML is a complete paradigm shift in how engineering-related problem are addressed. Becoming a data science expert practitioner requires a comprehensive understanding of how to modify the traditional engineering problem-solving approach. Understanding the mathematics behind the ML algorithms contributes less than 10% to becoming a true data scientist, specifically when it comes to engineering-related problem-solving. This is why it is waste of time to concentrate purely on the mathematics behind ML algorithms during the short courses that are taught on this topic.
Engineering Vs. Nonengineering Problem Solving
So what is the difference between data science as it is applied to physics-based vs. nonphysics-based disciplines? When data science is used to address nonphysics-based problems, usually a combination of traditional statistics and AI and ML algorithms is used. The major requirements for solving such problems are expertise in AI and ML and large amounts of data. Hardly any type of domain expertise is required for solving such problems. The application of data science in social networks and social media, consumer relations, demographics, or even politics do not require expertise other than traditional statistics as well as AI and ML. In many such cases (nonphysics-based areas), the relationship between correlation and causation is not an issue. Usually, upon completion of analyses and identification of certain correlations, the statisticians (or AI experts) will go to the scientists or domain experts and ask them to address why such correlations could exist using explanations from a psychological, sociological, or biological point of view.
On the other hand, applying data science to physics-based problems such as multiphase fluid flow in reactors [computational fluid dynamics (CFD)] or in porous media (numerical reservoir simulation and modeling) is a completely different story. The interactions between parameters that are of interest to physics-based problem-solving, despite their complex nature, have been understood to a large extent and have been modeled by scientists and engineers for decades. Therefore, treating the data that is generated from such phenomena, regardless of whether they are based on real measurements or generated by simulation, as just numbers that need to be processed in order to learn their interactions (as is done using traditional statistical tools) is a gross mistreatment and over-simplification of such problems.
These types of approaches hardly ever generate useful results. That is why many such attempts have resulted, at best, in unattractive and mediocre outcomes, so much so that many engineers (and scientists) have concluded that data science has few serious applications in industrial and engineering disciplines. Given this, some traditionalist may ask the following question: How can AI and ML contribute to industrial and engineering-related problems when the interactions between all the involved parameters have been understood and modeled for decades?
My answer is efficiency and accuracy.
AI and ML have a game-changing contribution to industrial and engineering-related problems. This technology will completely change the future of many industries through a game-changing and transformational increase of the efficiency and accuracy of the problem solving. The contributions of AI and ML to many industries can be summarized in two classes:
- Class One: Minimization or avoidance of assumptions and simplifications in order to build highly realistic models of the physical phenomena.
- Class Two: Minimization of computational footprint of the numerical models such that they can act in a realistic and practical manner.
Let’s discuss both of these classes of contributions of AI and ML in the context of the oil and gas industry.
Class One: Minimization or Avoidance of Assumptions and Simplifications
Building reservoir simulation models requires a large amount of assumptions and mathematical simplifications. No reservoir engineer or geoscientist has ever been thousands of feet under the ground to visit, observe, touch, test, or analyze a hydrocarbon reservoir directly. We extract cores with a few feet of length and few inches of diameter from the wells in the hydrocarbon reservoirs that are tens or hundreds of cubic kilometers in volume. Then, in the laboratory, we take an even smaller core (2 in. long and about 1 in. in diameter) and perform a core analysis. Our well logs only measure about 6 in. away from the wellbore.
Looking at nature, we see that rocks are obviously highly heterogeneous. We measure permeability only from a few cores and capillary pressure and relative permeability from a much smaller number of cores and then come up with one or two relative permeability curves and apply them to the entire reservoir. Then, we modify the relative permeability or transmissibility (i.e., a function of permeability and reservoir thickness and relative permeability) and well skin (local modifications) in order to history-match the results of the numerical reservoir simulation model.
Furthermore, we simplify the complex second-order, nonlinear, partial differential equation into linear equations that are applicable to small volumes in space (the gridblocks) in short periods of time (small timesteps, especially during the transient period). One of the main reasons behind using the numerical solutions for these complex equations is to avoid major simplifications of the problem (and substitute it by simplification of the solution) by addressing our interpretations of the heterogeneity of the reservoir. When it comes to unconventional reservoirs that are producing because of massive hydraulic fracturing, numerical reservoir simulations use parameters such as fracture length, fracture height, fracture width, and fracture conductivity. In some other, far more simplified cases, stimulated reservoir volume is used. These parameters are based completely on guesswork and gross assumptions and have absolutely nothing to do with reality. These are only a few examples of assumptions and simplifications involved in numerical reservoir simulations.
Here is how AI and ML contribute to reservoir simulation and modeling: Data-driven reservoir modeling (not the so-called hybrid models) only uses field measurements to model the physics and build a reservoir simulation model. This technology that uses a comprehensive combination of multiple ML algorithms (including spatiotemporal learning, deconvolutional neural networks, fuzzy set theory, genetic algorithm, and active learning to name a few) avoids any and all assumptions and simplifications in order to build a reservoir model based purely on facts (field measurements). Top-down modeling (TDM), a completely data-driven reservoir modeling technology, is a coupled reservoir and wellbore model. TDM uses choke setting and wellhead pressure (not flowing bottomhole pressure) as input and generates oil production, gas/oil ratio, water cut, reservoir pressure, and water saturation as output. TDM’s history matching is automated completely and never uses any local (well-based) modifications to achieve a history match. The time and resources required to develop, history-match, and deploy a comprehensive TDM for a mature field with hundreds of wells is less than 10–15% of the traditional numerical reservoir simulation models.
Class Two: Minimization of Computational Footprint
CFD and numerical reservoir simulation are used currently to solve complex problems. The realistic versions of such models usually include millions (or tens of millions) of cells and must use very small timesteps during the transient period. This results in an extensive computational footprint (many hours for a single simulation run) even when graphics processing units or high-performance clusters are used. Uncertainty quantification associated with reservoir characteristics or operational conditions, CO2-enhanced oil recovery, or waterflooding optimization and field-development planning in large, mature oil and gas fields with hundreds of wells require examination of hundreds of thousands and sometime millions of scenarios. Large and complex numerical simulation models cannot provide the speed that is required for such tasks.
Traditionally, to accomplish such objectives, the physics of the model or the resolution in space and time are reduced (reduced order models) or, sometimes, statistics are used to generate response surfaces. AI and ML can be used to develop a smart proxy model for full-field, dynamic reservoir models that can run in seconds or minutes. Smart proxy models do not reduce the physics or the resolution in space and time. They are purely data-driven proxy models that are generated from a handful of simulation runs to reproduce details of numerical reservoir simulations at the gridblock level with very high accuracy. Smart proxy modeling makes numerical reservoir simulation a realistic and practical tool for reservoir management.
Petroleum Data Analytics and Big Data
Many issues need to be discussed in detail regarding the future of our industry. A few issues that will be discussed soon are petroleum data analytics and the application of big data in our industry. Petroleum data analytics as the future of our industry is an integration of the fundamentals of petroleum engineering with AI and ML. Another important item is the application of AI and ML in engineering-related problems that include big data. Differences exist in how AI and ML are used with engineering-related problems that include big data vs. those that include non-big data. In the context of petroleum data analytics, big data may be involved in problems related to artificial lift, drilling, and fiber optics vs. non-big data (i.e., large amount of data, but not necessarily big data) that refers to problems related to data-driven reservoir modeling of the mature fields or development of synthetic well logs.
Mohaghegh has authored three books (Shale Analytics, Data Driven Reservoir Modeling, and Application of Data-Driven Analytics for the Geological Storage of CO2) and more than 170 technical papers and has carried out more than 60 projects for independents, national oil companies, and international oil companies. He is an SPE Distinguished Lecturer (2007–08) and has been featured four times as the Distinguished Author in SPE’s Journal of Petroleum Technology (2000–04). Mohaghegh is the founder of Petroleum Data-Driven Analytics, SPE’s Technical Section dedicated to artificial intelligence and machine learning. He has been honored by the US Secretary of Energy for his AI-related technical contribution in the aftermath of the Deepwater Horizon accident in the Gulf of Mexico and was a member of US Secretary of Energy’s Technical Advisory Committee on Unconventional Resources in two administrations (2008–14). Mohaghegh represented the United States in the International Standard Organization in Carbon Capture and Storage (2014–16).