Data science is a broad and sometimes confusing word, so it is likely that you have heard many different definitions of it. A good general definition is “data science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured.” While techniques and underlying principles of data science have been around for decades through various disciplines such as statistics, computer science, machine learning, and probability theory, it is only recently that data science as a unifying umbrella has received unprecedented attention and popularity, and rightfully so.
A recent famous example is the 4-1 defeat of the reining GO champion by Google’s DeepMind team. GO is a board game orders of magnitude more computationally complex than chess; therefore, brute force computational solutions are not viable yet for solving GO. Just recently, it was generally thought that a solution to GO was at least 10 years away. However, DeepMind’s approach to solving this became possible due to access to huge amounts of training data, access to Google’s very large graphics processing unit (GPU) clusters and significant advances in deep-learning neural networks over the past decade. This surge in popularity, among other things, has resulted in significant venture capital funding of data science startups and has become a key value driver for larger companies. Furthermore, the oil and gas industry, while generally lagging in uptake of new technology, has also seen a surge in activity and applications of data science.
Traditionally, the oil and gas industry has been collecting a variety of data, such as production data, log data, geological data, completion data, artificial lift data, maintenance records, and data from permanent downhole sensors. However, more often than not, most of these data are not used to the fullest extent possible, and almost certainly not for proactive decision making. For example, water breakthrough in waterfloods can be a significant problem, resulting in reduced sweep efficiency and ultimate recovery. But, due to reservoir heterogeneity, it is generally hard to predict. Thus, operators mostly resort to reactive control, i.e., trying to remediate the well after it has broken through. These reactive solutions are usually “too little, too late,” and can be quite costly.
The oil and gas industry is certainly aware of this problem and has, over the last decade or so, emphasized “digital oil fields” or “smart fields,” which are implementations of the concept of “closed-loop reservoir management.” This concept, in essence, is to continuously maximize the life cycle value of oil and gas assets by real-time monitoring, continuous updating of predictive models with latest data, and continuous optimization of multiple long- and short-term decisions, such as well preventive maintenance to optimal water redistribution in a waterflood.
While the data collection/surveillance part of the closed-loop has improved significantly over the past decade, data integration, modeling, and optimization are still lacking. One of the reasons for this is that traditional modeling workflows are not easily amenable to this concept. Such workflows generally range from the use of very simple analytical models (type-curves, etc.), which are useful for rough “ball park” estimates, to the very complex reservoir simulation models for quantitative optimization, at least theoretically. However, issues such as the exorbitant time and effort required to build and calibrate these simulation models that are computationally complex prohibits their practical use for closed-loop reservoir management.
Data science can help in moving from reactive remedial solutions to proactive decision making and closed-loop reservoir management. Data science enables this through integrating different kinds of data into predictive models (predictive analytics), which can then be used to predict future reservoir/well/surface facility behavior, and then optimize such decisions to maximize asset value. This additional optimization and decision making step then leads to “prescriptive” analytics, that is, providing specific recommendations to the operator to solve a specific problem.
Oilfield Data Science Organizations in Silicon Valley
There are a few universities, startups, and multinational corporations currently trying to apply data science and predictive modeling to several aspects of oilfield management, most making their base in the “prolific data science” home of Silicon Valley in the San Francisco Bay Area of California.
Stanford University lies at the heart of Silicon Valley and is the home to some of the best and brightest minds working on smart oil and gas field projects. Energy Resources Engineering department’s Smart Field Consortium hosts researchers working on areas that include optimal well placement; reservoir model order reduction; and gradient-based history matching using clever proxy models often derived from reservoir, production, and geological data.
University of Southern California recently launched its first ever master’s degree in Smart Oilfield Technologies. This degree program was created based upon industry’s request to train existing staff and new hires with skills related to the operation of smart fields. The new USC Center for Interactive Smart Oilfield Technologies, CiSoft, sponsored by Chevron, will provide facilities for research and training in these technologies, offering cutting-edge course content for this degree.
One of the older and established petroleum data science startups in Silicon Valley is Palantir, which works across many verticals including oil and gas. Palantir has worked with companies such as Shell to provide enterprise planning and analytics software that enables data-driven workflows that are collaborative, systematic, and secure, and helps companies collect, process, and analyze their investment portfolio—enabling planners to quickly produce insightful, accurate business strategies that optimize the future value of their resources.
Another Silicon Valley startup with a slightly different flavor is Kaggle. The company started as a data science competition company, providing a platform for connecting thousands of data science practitioners to companies through data science competitions. This approach enabled companies to find solutions to difficult predictive modeling problems by utilizing a huge network of experts, sometimes from totally unrelated fields. Kaggle also focused a few verticals, including oil and gas, as they realized that oil and gas can gain significantly from machine learning. They focus on shale plays, using geological data to predict the best acreage to lease and to drill.
Tachyus is a relatively new startup based out of Silicon Valley primarily focused on oil and gas. They have pioneered a new modeling paradigm called “data-physics.” Data-physics is the amalgamation of the state-of-the-art in machine learning/data science and the same underlying physics present in reservoir simulators, leading to the creation of predictive models that alleviate issues with traditional modeling approaches. These models can be created as machine learning models to evaluate orders of magnitude faster than full-scale simulation models to improve long-term predictive capacity. As such, data-physics models bridge the gap between data science and traditional reservoir modeling and enable closed-loop reservoir management.
Back in 2011, when Texas-based oil and gas service firm Baker Hughes opened its Palo Alto Innovation Center, machine learning was not really on the radar of most industry players. Last year, Baker Hughes rolled out software that, among other things, analyzes conditions around electric submersible pumps helping lift oil out of wells and alerts well operators in advance of a potential problem. The software crunches data on historical pump performance and other variables, such as mud pressure and temperature. Additionally, Schlumberger Software Technology Innovation Center in Silicon Valley targets a wide array of software solutions that includes big data analytics.
Conglomerates like GE and IBM have also started data science efforts in oil and gas. For example, GE is working in partnership with BP to ensure the efficient operation of critical rotating machinery found on BP’s production facilities. By analyzing sensor data such as vibration, rotor position, temperature, pressure flow, and other parameters, GE is able to identify changes in the operating condition of the machine or determine that the machine is no longer performing at its optimal capacity. Identifying the early onset of abnormal operating conditions minimizes disruption and avoids unnecessary periods of down time that often result in lost production or increased costs.
While it is always difficult to predict the future (no pun intended), it is abundantly clear that data science in oil and gas is here to stay. For young professionals in this industry dealing with data, it is prudent to add a data science skill set, as such skills with enable them to make better, faster decisions, broaden their career choices, and ultimately improve the company’s bottom line. And never has acquiring such skill sets been easier, with companies like Coursera and Udacity providing online “nano-degrees” in data science, and prestigious universities like Stanford opening up courses in data science to the masses online. So it is up to you to take advantage of this opportunity, or get lost in the sea of change.