Data mining/analysis

Mining Daily Driller’s Reports Looking for Telling Patterns

There is a lot of information buried in drilling reports written every day, but little of it appears in computer databases.

jpt-2015-06-miningdailyfig1.jpg
Words extracted from daily drilling reports were analyzed by BP using a program that identified frequently used words and usage trends over time. These word clouds show how pump references become more prominent over time, suggesting driller concerns that might otherwise go unnoticed.

There is a lot of information buried in drilling reports written every day, but little of it appears in computer databases.

Numbers from daily drilling reports can be added to databases along with data from sensors on the rig that can be crunched in seconds by a computer. But the perspective of the drillers is only available to a person reading the reports. Turning those sometimes personal, always jargon-filled observations into a form a computer can analyze is a challenge the oil industry is beginning to take on.

BP created a system able to mine the daily reports filed by drillers, seeking what a paper on the project (SPE 173429) describes as “insightful.”

The project used natural language processing to seek out key words and phrases related to drilling, and pattern-recognition software to identify common concerns. The paper described it as an “automatic discovery of unobservable events.”

The results were displayed in the form of word clouds. These graphic displays highlighted the size and prominence of terms related to drilling performance appearing frequently in reports. The goal is to begin using this unstructured data—documents not in the form of a computer database—to answer a pressing question:

“How can we pinpoint the progress of things that could impact the performance and the cost of the well” as it is being drilled, said Mohamed Sidahmed, a data analytics scientist at BP, who recently spoke about the three-well project at the SPE Digital Energy Conference and Exhibition in The Woodlands, Texas.

“You might be able to spot the most high-frequency events, but most of the time, there are subtle issues that do not reveal themselves obviously,” Sidahmed said. For example, during the test, the analysis highlighted driller comments about a mud pump 3 weeks before it broke down.

Language Learning

BP is the first oil company to report using these data-mining methods for drilling monitoring. Its goal is to increase safety and efficiency. These techniques have been previously used for analyzing trends on the Internet and monitoring by intelligence agencies.

In the field of business analytics, pulling data from reports, emails, speeches or anything else that has been written falls into the realm of natural language systems. Finding useful information from technical papers, such as the SPE papers on the OnePetro website, is an area of interest for Currie Boyle, an IBM distinguished engineer in business analytics. In an industry where a new generation is taking over, he said this could be used by newcomers for “knowledge capture.”

“How do I get someone new up to speed?” he said. He wants to use unstructured data searches to teach newcomers “faster and more successfully” than by relying only on experience and mentoring.

But developing a program to extract answers from a technical paper presents problems ranging from programming the machine to learn the meaning of the evolving language of engineers—a Christmas tree is for valves, not gifts—to finding ways to sort out the best answer from papers written by experts who frequently disagree on how to explain well data.

Boyle compared this challenge to another of his interests, autonomous machines, such as driverless cars, and noted that “open unscripted natural language interfaces are much more difficult than autonomous control.”

But the potential of mining these documents is huge because there is so much unstructured information out there. IBM said in most organizations, 80% to 90% of the information is unstructured. It is a young field that IBM traces back to 1997. It has been used to analyze call center traffic to look for early signs of product defects based on the questions asked.

“Structured information can give you answers to questions that you already know to ask,” explains Scott Spangler, a senior technical staff member in text mining and software development at IBM, who was quoted on the company website. “But what unstructured information can tell you is the answer to questions you didn’t even know you needed to worry about. It lets you know what you don’t know.”

There is room for growth in all sorts of data in the oil field. BP is mining the text of daily drilling reports seeking unnoticed problems. Baker Hughes is pulling the numbers from these reports to help feed software that it developed to compare the actual time required to drill wells with performance benchmarks for each task required to do so. It produces measures of nonproductive time and what the Baker Hughes paper (SPE 173413) called invisible lost time, which is work done slower than on comparable projects.

Reports on how long it takes to perform each step of the drilling process plus the time when drilling is not progressing, are compared to performance indicators from similar wells.

For example, the Baker Hughes paper found that a crew drilling an offshore well was taking 8 minutes for pipe connections that normally would take about 3.5 minutes, which could add more than 93 hours if that time was the norm for the project. The goal is to quickly identify and solve problems before the time lost grows large.

Tough Test

The Baker Hughes system pulled numbers from drilling reports. Pulling words from the text of those daily reports is far harder to do.

While numbers and standard measurement units are strictly defined, words found in reports written by drillers have specific meanings that cannot be found in an online search. To fill that gap, BP’s information experts talked with drilling experts to build a dictionary of words and acronyms used on the rig in reports.

BP’s test required setting up a program to extract useful text from the daily drilling reports. The analysis weighted the terms based on their frequency of use and their relative importance.

The words in the report were analyzed using an algorithm that created a “term frequency matrix” to show which words were becoming more prominent over time. In other words, those things were trending. In a diagram showing the words used, the mud pump grew larger over time.

The same could be done by a regular reader, but these scattered references would have been easy to miss. Using digital analysis to change how wells are drilled requires convincing humans that the machine is a useful member of a “collaborative real-time environment.”

A question facing the effort is how to best present the results to workers in a way that effectively gets the appropriate level of attention.

A pattern is not a prediction, but it can point to areas worthy of further attention. The pattern showing the mud pump getting larger suggested that drillers were concerned. Report analysis could have led to a check of why it was mentioned and whether other available data pointed to a problem.

“The goal is to get past reacting to things based on real-time data,” Sidahmed said. This is part of a larger effort to use “data-driven learning techniques for decision making to support drilling deeper, longer, and more challenging wells.”