Data & Analytics

Oil and Gas Has a Problem With Unstructured Data

Unlike structured data, unstructured data is information that either does not have predefined labels or is not organized in a predefined template, and inefficient management of this data is holding back the industry.


Inefficient unstructured-data management is holding back the oil and gas industry. Ashild Hanne Larsen, the chief information officer and senior vice president for corporate information technology at Equinor, says that 80% of employee time in the industry is spent looking through unstructured data in order to inform decisions to get work done. The International Data Corporation puts that number at 30% across all industries, so oil and gas is spending nearly three times the general average searching their own documents for information so that they can get their work done. Four out of every five working days in the industry are devoted to researching unstructured data. Is the industry ready to face the problem and usher in new efficiency? 

What is meant by unstructured data? Think of emails, old reports, operating manuals, or compliance documents. Unlike structured data, unstructured data is information that either does not have predefined labels or is not organized in a predefined template. Unstructured data is typically text-heavy but may also contain other data formats such as dates or numbers. It is also typically generated by people, as opposed to sensors or computers. This results in irregularities and ambiguities that make it difficult to research or manipulate using traditional software programs compared with the structured data stored in fielded forms within databases. 

So why is the oil and gas industry lagging so far behind in time spent researching unstructured data? Discussions with operators revealed four major reasons. The first is the sheer technical complexity of the oil and gas industry. 

The second reason is that the process-driven nature of the oil and gas industry necessarily leads to more unstructured data. The industry has many precedents that could illuminate current problems, but not all of the information is collectable in structured data formatting. For example, how decisions were made, what details were noticed, and what options were considered are commonly recorded by people in written summaries and reports precisely because that unstructured data could be of value the next time a similar issue arises.

The third reason is that the consequences of underinformed decision-making can be catastrophic. Successful serial entrepreneur Steve Garrity is now working on a  startup in Silicon Valley to help software developers. Garrity explained that, “In software development, engineers can often assemble and test ideas directly in the same amount of time it would take for them to research and discuss their viability beforehand. The testing environment can be designed to closely resemble the live environment, so conclusive results are quickly obtainable. Even if the idea fails in the live environment, the consequences are relatively low-stakes: perhaps a server goes down or a client gets angry. Compare that to the oil and gas industry where physical infrastructure can get destroyed, people can get hurt, or the environment can suffer damage. Of course people are going to document and reference their decisions a lot more in oil and gas.” 

Fourth, the oil and gas industry has been faced with the big crew change. Tracking membership of the Society of Petroleum Engineers between 2006 and 2016, a close proxy for global professional employment, clearly shows the shifting demographic (Fig. 1). As these people retire, they leave behind an enormous expertise gap of roughly 20 years on average. The new employees need to spend additional time referencing information that their predecessors left behind.

Image Border Editor:
Source: SPE.
Fig. 1—The shift in the age distribution of
SPE’s professional membership between
2006 and 2016, which is shown in the graph,
closely tracks the shift in the same span
among professionals in the industry
as a whole.

The oil and gas industry needs operational and work-flow efficiency as much now as it ever has, but its focus remains elsewhere. Discussions with oil and gas operators revealed three challenges slowing the industry’s focus on outsourced solutions to unstructured data management. The first is a mindset of inevitability. An assumption presented during the discussions is that, without subject-matter experts to help answer questions, workers are left with no choice but to browse through their surviving documentation to find the information needed to make decisions.

When asked what problems are a priority, oil and gas leaders mention issues related to an aging workforce as well as process reliability and project timeliness, but they do not associate the overarching efficiency challenge with ease of navigation of the firm’s unstructured data. Business research and advisory company Gartner, on the other hand, predicts that data volume will grow 800% over the next 5 years and, of the new data generated, 80% of it will be unstructured data. Gartner also reports that the growing unstructured-data problem has less to do with storage than with accessibility. 

The second challenge is the industry’s lack of experience working and sharing data with software providers. Oil and gas operators traditionally have worked with highly specialized service companies or with highly specialized branches of larger service companies offering turnkey solutions. The technical and logistical complexity of oil and gas operations requires client-specific focus, so oil and gas operators are rightfully wary of services claiming to provide a single tool that can solve their problems along with problems faced by other firms or even other industries. Some niches within the industry are more-effectively handled by industry-specific software firms. The industry certainly uses general software tools such as email service providers and cloud storage services, but it also uses specialized engineering software tools. In addition, these various types of software are already technically handling the proprietary data of the operator, but that fact is baked in as a part of services so conventional as to be standardized and overlooked. The industry has little experience negotiating data sharing with software providers small enough that they cannot be sued for the value of the data if there were to be a breach. As such, the most successful software startups in the oil and gas industry tend to be those that do not require access to the clients’ data or are themselves security companies. Software startup Rigup provides a marketplace for operators and workers to find one another, and it has experienced explosive growth. Another new software company in the industry, Drillinginfo, manages data, but it handles public data. BP’s recent investment in unstructured-data-management startup Belmont shows the potential for the industry to overcome traditional skepticism. 

The third challenge is the tendency of operators to focus on obsolete or failing traditional solutions to the unstructured-data-management problem. Many firms still implement exhaustive document reorganization efforts, renaming and retagging files and sorting them across new folder hierarchies. This process is time consuming, subjective, and tends to yield poor results. According to one subject-matter expert in an oil and gas major, the firm hired a top management consultancy to help design the folder hierarchy and document naming and tagging system. The project was abandoned after 100 hours had been spent attempting to sort out all the documents. Even if the project had been finished, the subject-matter expert said that it would merely amount to “reshuffling the deck.”

The other traditional solution that fails is standard off-the-shelf enterprise search, which tends to pull up large quantities of documents based on key words entered into the search bar. From a survey of 51 people across 30 oil and gas operators and major service providers, only 6% say they are satisfied with the results they get from their enterprise search tools. Most cite problems with too many search results and low relevance of the documents or information returned. 

Overall, unstructured-data management presents a major and growing problem within the oil and gas industry, and industry-focused software providers are the most likely solution to the problem. The market timing of that solution, however, depends on how quickly awareness of the problem grows. Given the level of competition in the industry today, it is likely that the adoption of an effective solution by one operator within the industry will lead to a spike in demand from others. 


Alec Walker is the cofounder and chief executive officer of the natural-language-processing and data-analytics firm DelfinSia in Houston. Delfin helps the oil and gas industry extract value from unstructured data. Walker has led digital-transformation and internal entrepreneurship projects for a variety of leading organizations including Intel, Inditex, AECOM, and General Motors. He has worked for Shell as a technical service engineer in refining, a tech tools software product manager, and as a reservoir engineer in unconventional oil and gas. Walker holds a BS degree in chemical engineering from Rice University and an MBA degree from the Stanford Graduate School of Business.