AI/machine learning

Pioneer's Analytics Project Reveals the Good and Bad of Machine Learning

A recent research effort has shown that the digital journey is full of stumbling blocks. Just like humans, advanced computing technology will get some things right and some things wrong.

Pioneer Natural Resources' field offices in West Texas.
One of Pioneer Natural Resources' field offices in West Texas. Source: Pioneer

Machine learning is supposed to help the oil and gas industry discover the important insights that human engineers keep missing. However, a recent project by Pioneer Natural Resources shows that engineers do indeed know their stuff and, while the machines are powerful and full of potential, they still need to get smarter. 

The Irving, Texas-based shale producer recently put five competing analytics vendors through a proof-of-concept challenge to see which among them could help improve the firm’s well completion strategies in the Permian Basin. Jace Parkhurst, a senior reservoir engineer with Pioneer who worked on the project, shared the mixed results on the final day of the Unconventional Resources Technology Conference (URTeC), in Houston.

His broad takeaway from the experiment: “We didn’t really learn anything spectacular from machine learning that we didn’t already know.”

The Good

That said, the results of the algorithmic project, which was based primarily on geologic data from about 1,600 of Pioneer’s Permian wells, achieved outcomes that show the current state of the programming technology is capable of validating some big completion decisions.

Perhaps most noteworthy is that the machine learning models agreed with the operator’s new plan to step up the size of its hydraulic fracturing treatments to generate more production.

For context, Pioneer’s latest “3.0+” completion design uses a base recipe of 60 bbl of water, and 3,000 lb of sand per lateral foot, representing some of the biggest jobs the company has ever done.

In one exercise, more than 10,000 iterations of completion designs were modeled and the best performers, both on a production and economic basis, were those that called for larger volume fracturing jobs.

“And that kind of jibed with what we thought internally,” said Parkhurst, who added that the company was surprised by, but is now testing, one model-based suggestion to increase its fracture stage lengths and tighten up perforation clusters in one of its development areas.

Without naming the technology vendors, or who came out on top, he also said that the models were “intelligent enough” to confirm that what drives performance in horizontal wells is fluid volume, sand loads, cluster spacing, stage lengths, and well spacing—and generally in that order.

Perhaps more interesting and surprising though is that the fracturing fluid type—i.e., slickwater, crosslink gels, or hybrid fluids—was not shown to be a primary productivity driver.

The reasoning behind this was not elaborated on, but Andrew Sommer, a senior coordinator of technology strategy at Pioneer, commented that, “The way you build models has a big impact on certain features, so take it all with a grain of salt.”

Still, Pioneer has enough confidence in machine learning to begin using it for other important planning tasks such as generating well decline curves. This work is underway, but the research engineers were careful to point out that these new machine learning-based decline curves will be used only for budgeting purposes, and that traditional decline curve analysis, done by humans, will still be used for reserves estimation.

The Not So Good

Though there were some wins, the models delivered a number of disappointing results, which were chalked up to a combination of insufficient data and the brittleness of the computing technology.

Using a subset of data from about 1,000 wells, the models were asked to estimate first-year production results and, “did a pretty good job of predicting performance,” according to Parkhurst. But when given the same challenge using data from only 400 wells that tap into a different rock layer, the modeling products were much less accurate.

The conclusion, which may be humbling for smaller operators, was that data from 400 wells is probably not enough to build a good production forecasting model.

This underscores that the models (specifically those based on the random forest approach) are heavily dependent on large volumes of high-quality data. Without it, they tend do a poor job at extrapolating trustworthy predictions. When seemingly large data sets are still not enough, it was advised that companies try supplementing their internal inputs with public data.

Other related shortcomings were highlighted when the models largely underestimated the actual production results of the most recent completion well designs. In addition, they would tend to over-predict production from some of the company’s poorest producers.

The models also fell victim to being confused by extraneous data. Pioneer saw that if well names, or the name of the wells’ home counties, were included, the models would rank those variables as key performance drivers. This was said to “short circuit” more quantifiable performance drivers.

More digestive problems were experienced when the models tried to use other qualitative information from microseismic surveys and hydraulic fracture models.

Such data sets are often subject to interpretation, and are therefore are difficult to feed into models without context. So the research team overseeing the project ended up turning to petroleum engineering experts to help do this correctly.

Parkhurst said the lesson here was that operators taking similar analytics journeys should not over-rely on data science teams, or their emerging digital tools, to come up with all the answers by themselves.

“We tried that and it didn’t work very well,” he said, adding later, “I think the good news is that we’re all going to still have jobs in the future because machine learning is not going to be able to do everything.”