Smaller and independent upstream companies often have limited resources for data management. Nonetheless, their data is valuable, and it must be managed for that value to be realized. Geologists may just be in the perfect position to do the job, if they can get the training.
“Geologists are not usually trained in data management,” said Lori Bryan, a petroleum geologist with 15 years of experience, “but the role often falls to them because they rely upon a wide variety of data from multiple sources to do their jobs. … When a geologist starts a new project or expands an existing one, they need all the available data in that area.”
Bryan, who has worked for Winright Resources, IEC Corporation, and Range Resources and is currently pursuing her master’s degree at Texas A&M University, spoke at a recent event held by the Professional Petroleum Data Management Association. She said that the lack of data-management training for geologists, combined with a lack of data standardization, can lead to ad hoc databases, confusion, and costly mistakes.
Compounding the data-management problem are two important facets of data: variety and veracity. Geotechnical data comes in many varieties, both structured (e.g., subsurface, drilling, and production data) and unstructured (e.g., wireline logs, daily reports, and leases).
“This data also comes from several sources, for instance operating partners, regulatory agencies, data vendors … or even physical well files,” Bryan said. “I personally once discovered a lost 3D seismic shoot by finding paper copies of 3D lines in a well file.”
The veracity of data, on the other hand, refers to its quality and usefulness. “Unfortunately,” Bryan said, “data from these various sources have data-quality issues.”
These quality issues can be easy to catch, such as typos or duplicates, or they can be difficult to catch, such as faulty tools or poor seismic acquisitions. Regardless, all the data must be collected and formatted before it can be used, and that can take time.
“Most people in the industry, especially managers, think that a geologist’s time is better spent on data analysis rather than data-management practices,” Bryan said, “but, if these issues aren’t caught in time, they can lead to some pretty costly errors, specifically hitting an abandoned wellbore with a lateral or drilling a dry hole.”
Another challenge for the geologist turned data manager is discrepancies in nomenclature. “For example, producing horizons are often named locally and that same name can have a different name in another field or even just a township away,” Bryan said. “Additionally, there are no standard abbreviation or spelling for these units.”
Bryan pointed out also that different wireline companies, for instance, use different tools that use different names and abbreviations for the curves they create, with the abbreviations sometimes differing within an individual vendor’s own data set. “Adding on to that are any typos that must be corrected with a script, which can lead to a hundred different names for a single curve alias.”
All these hurdles mean that the person writing the data-cleaning script must be familiar with the geology. “A person who isn’t experienced with petrotechnical data is going to have a difficult time choosing the correct alias,” Bryan said. “This means that writing the script will be labor- and time-intensive on the front end, but, in the long run, it will be a good use of a geoscientist’s time because they are often the ones with the expertise to properly manage this data.”
Bryan said the best solution to the problems of data variety and veracity is for a company to create a data-management position or outsource the data-management job. “If neither option is feasible,” she said, “the geologist must be trained to properly collect and store the data in a centralized database in a way that all the disciplines can have access to the data. Even the simplest algorithms can learn impressive insights from data that has been properly scrubbed. It is well worth the cost and capital up front to allow the geologist to use their time analyzing and interpreting the information to make data-driven decisions.”