Two weeks. 13 bootcamps. 150 participants. Over 10,000 oil and gas wells represented in the form of more than 1 million lines of data.
These were the key ingredients to a recent upstream-focused datathon—a data-intensive version of a hackathon—in which a group of petrotechnicals calling themselves the “Mighty DUC Hunters” claimed top honors.
Albeit on a small scale, their path to victory highlights how some engineers are navigating the uncertainty surrounding an industry downturn that is being dictated by a global pandemic.
Named after the shale sector’s fluctuating population of drilled-but-uncompleted wells, or DUCs, the “DUC Datathon” was organized by the SPE Calgary section and Untapped Energy, another nonprofit working to expand data science within the oil and gas business.
And while hosted in Calgary, the event drew people from around the world since every aspect of it was held virtually. Such is the norm these days, but perhaps not to be taken for granted.
The once full-time office workers who now must work together but separately from home do so in an effective manner thanks to widely available and often free software that keeps us all connected. Likewise, the ability to acquire basic data science skills today requires only a good internet connection and a strong personal commitment.
True to form, the members of the winning team had little to no coding experience before entering the competition.
“I’ve never done this before. My day job has been data collection, quality control, and cleaning—but never the in-depth analysis or interpretation of it,” said Tony Akhigbe, a completions engineer with a Calgary-based service company called Horizontal Wireline Services and one of seven people that formed the Mighty DUC Hunters.
A 20-year industry veteran, Akhigbe is learning about data science to become a better problem solver and to that end completed his first online courses in Python coding and Tableau only weeks before the competition began. He recalled the experience of taking on his first real machine-learning project as “a big shock to the system.”
Most of the Mighty DUC Hunters were strangers to each other before forming the team. Even those that live in Calgary never met face to face during the 2-week competition to maintain social distancing.
One teammate was in Regina, a 7-hour drive to Calgary’s east. Another did his entire bit while trapped on an extended vacation in Australia after international travel restrictions came into effect. In the end, none of this proved to be an obstacle big enough to keep the team from impressing their judges, all of whom are industry and analytics experts.
Working from Sydney—a mere 17 hours ahead of Alberta time—was Jeremy Zhao. He got involved only by chance after seeing that Akhigbe (who he did not yet know) posted about the competition on a public Slack channel created by a data science community back in his hometown of Calgary.
With only a couple of months of Python studies under his belt, the process engineer who before the downturn spent 10 years working in Alberta’s oil and gas fields described his coding skills to that point as “crude.”
Nonetheless, Zhao had more experience than most others on the team and was assigned a heavy portion of the programming duties—a role he embraced with gusto. “I think it’s just kind of like any oil and gas endeavor,” he said. “You get thrown into the fire and you adapt pretty quickly.”
The experience has encouraged Zhao to keep studying data science, but this is more than just a new hobby for him. While awaiting to return home he is considering finding work outside of the oil and gas business which has shed tens of thousands of jobs globally since a pandemic-driven downturn began in March.
Zhao predicted, “If work continues to be slow for the rest of the year, which is kind of foreseeable, then that transition will continue to happen for myself and many of my colleagues.”
The fact that a group of unexperienced people won the competition came as no surprise to Brian Emmerson, one of the judges. A geophysicist by background, Emmerson is a data scientist at Petronas Canada and said the datathon showed that as the esoteric world of data science proliferates it is also becoming more accessible to the uninitiated.
“There’s plenty of examples to demonstrate that a regular professional who can code and follow a few examples through has an order of magnitude more power in terms of what they can analyze now than say 5 years ago,” he said. “It’s quite staggering actually to see what that difference looks like.”
A “Genuine” Challenge
What Akhigbe, Zhao, and their cohort were asked to do in datathon very much resembled a real project that any oil and gas company might consider doing itself. Before being set loose on their own, the participants were given some basic training and access to popular analytics platforms such as Salesforce’s Tableau and TIBCO’s Spotfire.
They would then need to apply their fresh understandings to a public data set of 10,000 wells and find the total vertical depth (TVD) of each one of those wells. The Mighty DUC Hunters got their TVD figures by combining two different algorithmic models to correlate total measured depth to the completion reports that showed where perforating began in those wells.
Calling it “a genuine challenge,” David Shackleton explained that such work is very similar to what operating companies and analysts do to figure out which wells are targeting which formations, even if some data are missing. Shackleton, an analytics manager at upstream data management firm Independent Data Services, was a lead organizer of the datathon through the SPE Calgary section’s Data Science & Engineering Analytics group.
The other half of the competition, and its namesake, asked participants to take that same data set and flag all the potential DUCs. Because of the wide interest these dormant assets hold across the shale sector, the organizers felt that making them the big focus of the datathon would ensure participants gained some practical experience.
“If the price of oil suddenly goes up by $10 overnight, tomorrow people will be rushing out into the fields to perforate these wells and then they might be producing within days or months,” said Shackleton, emphasizing the financial stakes that DUC activity represents.
He added that there may also be challenges in completing the oldest DUCs due to a degradation in wellbore quality, something that might motivate an operator to look beyond just the total count. “So, it’s pretty complicated from both an economic and engineering point of view,” he said.
But these are just examples. Aside from tallying up the number of DUCs in western Canada, the teams were challenged to draw their own conclusions about the importance of the DUCs and translate that message in a short video presentation which ended up representing the bulk of each team’s score.
Emmerson said one of the most important lessons the experts wanted to convey to the green data enthusiasts is that building a so-called “solution” is not necessarily good enough to make a compelling case. To bring others along on the journey, you need to tell a “concise and reasonable” story, hence the heavy weighting applied to the presentations.
“These kinds of statistical techniques can mask a lot of errors,” he said. “You can get overconfident, and so delivering a clear communication about what you’ve actually done is one of the fundamental things that is often overlooked.”
Stumbling Upon The "DUC+"
To tell their story, the Mighty DUC Hunters wrote a series of rules to sift through the data, one of which was that any well not completed within about 2 months of being drilled should be considered a “true DUC.” Using a Python script put together from open source libraries, Akhigbe who led the presentation said the team was able to “wrangle and mine” about 1,500 DUCs from the large data set.
Some of the unsurprising ties to the DUC count they found included the price of oil, the rig count, and seasonal activity cycles. The goal, though, was to dig a bit deeper and and realize something not obvious.
To find at least one needle in the haystack, the Mighty DUC Hunters looked at DUCs in relationship to road, rail, and pipeline access. They then compared the past few years of financial records from the owners of the DUCs. None of this laborious investigative work uncovered a unique relationship.
Only right at the end of the project did a potential insight reveal itself to the team, something it decided to call a “DUC+”. “We found a lot of wells where, for reasons that are unknown to us, there was just a huge lag in production,” Zhao said in describing the more than 2,800 wells they designated as DUCs+.
The data showed that the operators of these wells had run brief flowback tests before shutting in the wells for extended periods. The team fixated on this aspect because there was concern over mistaking them for producers even though they lacked evidence of being completed via hydraulic fracturing.
Regrettably, the team never found out the significance of their discovery since their presentation of the findings represented the end of the exercise. No one really got the DUC assignment right though.
Shackleton recalled that one of the judges pointed out that none of the teams tried to suggest what to do with DUCs, as in when or why an operator should convert one or a cluster of wells into producers. “But you know, I bet that if we had given them another few days, that a lot of those teams could have given you that answer,” he said.
As for what’s next for the Mighty DUC Hunters, after splitting the $1,000 grand prize seven ways, Zhao and his teammates will gain some more perspective on what a career in data science might look like outside of the upstream industry. One of their rewards is a private meeting with a CEO of a machine learning-based risk analysis firm that the event organizers noted is expanding its ranks.
But most of the team, including Akhigbe and Zhao, will stick together and compete later this year in their second upstream-focused datathon hosted in Norway. This time around they will be challenged to use their newfound data science skills to interpret well logs.