Data & Analytics

Datathon Connects Geothermal, Oil and Gas Communities

The 2021 Geothermal Experience Datathon focused on the application of analytics and data-science tools on oil and gas well-log data to assess geothermal potential in two North American basins.

Businessman on dark background creating new futuristic energy power source 3D rendering.
Credit: Sdecoret/Getty Images/iStockphoto.

A recent datathon held from April to June 2021 focused on the application of analytics and data science tools on oil and gas well-log data to assess geothermal potential in two North American basins.

The 2021 Geothermal Experience Datathon (GTX 2021) was organized by a committee made up of members from the Society of Petroleum Engineers (SPE) Calgary and Gulf Coast Sections and Untapped Energy, a startup data-science organization. The primary objectives of the organizers were

  • To identify a problem statement relevant to the geothermal and oil and gas communities
  • To source the information needed for participants to explore the application of analytics and data science
  • To enable participant access to the educational content necessary to approach the problem statement and that was useful for their own applications

A datathon is an event where participants gather to solve practical industry-relevant problems by working in teams to generate insights and potential solutions. It serves as a platform connecting participants, organizations, and businesses for the purposes of upskilling, transferring knowledge, and exploring new problem spaces. Datathons, and the closely related hackathons, have increasingly become an important means of fostering collaboration and orchestrating activities between different parties along shared interests because of the network effects generated by their participation.

“The 2021 Geothermal Experience Datathon … has been a great success,” said Silviu Livescu, chief scientist at Baker Hughes and SPE’s technical director for data science and engineering analytics. Our amazing organizing committee comprising more than 20 dedicated volunteers from Calgary and Houston, and from the SPE Permian Basin Section, has planned this event over more than six months of weekly meetings with chiefly one goal: to show the importance of data science and analytics to the future of the energy industry, which allows the cross pollination of skills from the hydrocarbon energy community to adjacent spaces such as the geothermal energy community.”

A series of discussions between subject-matter experts and the organizing committee yielded the following GTX 2021 problem statement:

Repurposing oil and gas wells for geothermal energy production represents an enticing opportunity to generate value from existing infrastructure while reducing the costs of exploration and development. Assessing the potential for geothermal conversion requires the estimation of bottomhole temperatures in prospective basins. One approach is to utilize the abundance of information available from drilling, completions, and production. Participants will develop a methodology and work flow for creating machine-learning models to predict the formation temperature based on well-log information to generate insights and identify prospective sites for two North American basins, the Duvernay and the Eaglebine.

The virtual event consisted of two components, learning and competition.

The learning component focused on digital skills and soft-skills development, together with introductions to geothermal energy for participants through a combination of hands-on instructor-led bootcamps and workshops. These were supplemented by a curriculum of online coursework curated by the organizers and the education platform provider.

In the competition component, participants formed teams to work on the problem statement using the proprietary data set provided by the data sponsor. Teams submitted results that were assessed by combining a quantitative score determined by model accuracy and a qualitative score given by a panel of judges to determine the ability of each team to communicate results to a both a technical and a nontechnical audience. A major emphasis was placed on making the results of the competition open source, with code and presentations available on publicly accessible repositories.

The event was opened to the public and advertised through established SPE communications channels along with social media posts. The target audiences included students and recent graduates, professionals employed in the oil and gas and adjacent industries, data scientists, software developers, and individuals looking to upskill or transition to different roles.

A group of sponsors and partners from both the public and private sectors supported the event through the provision of subject-matter expertise, material for the learning and competition components, platform services, and financial support. The active engagement between the organizing committee and partners throughout the event was crucial in delivering on the primary objectives.

A total of 13 sponsors and partners supported the event, including major oil and gas operators, service companies, data-solutions providers, educational-platform providers, geothermal-technology companies, and data-science-platform providers. Registration totaled 243, with participants joining from across the US, Canada, and 11 other countries. The learning component included 14 bootcamps and workshops, along with 3,307 hours spent on the sponsored online learning platform, with 802 courses and more than 46,000 exercises completed by participants. The data sponsor released a curated, high-quality, real-industry data set of more than 800 wells to participants, enabling the competition. In this component, 17 teams completed submissions that were scored by a panel of judges. In addition, a kickoff event and the opening and closing ceremonies featured talks by people in senior leadership positions at their respective companies.

Delivering the datathon programming and content revealed several key challenges. Facilitating the event in a completely virtual environment because of the COVID-19 pandemic required a heavy reliance on communication platforms by organizers, partners and sponsors, volunteers, and participants that affected the coordination of activities and was compounded by the distributed nature of activities across geographical and organizational boundaries. The same communication platforms, however, together with experience adapting to new ways of working, enabled larger and more diverse participation than would have been possible at a local level.

Another challenge was the representation of sponsor and partner interests, including intellectual property considerations. These were alleviated by a transparent process of program delivery, contracts, and nondisclosure agreements.

The datathon has revealed the huge potential of collaboration to solve practical geothermal and data-science problems. The positive response of the community, sponsors, partners, and organizations involved in the planning and delivery of the GTX 2021 Datathon is a strongly encouraging sign of the opportunity for future events. In particular, the high adoption rate of the learning component demonstrates the educational program offering as a viable and attractive means of connecting participants to high-quality, in-demand skills training with extremely low barriers to entry. Moreover, the high-quality results generated by participants in the competition component showcase joint development possibilities for cross-domain applications in the geothermal, oil and gas, and data-science domains.


David Shackleton, SPE, is a certified petroleum data analyst and head of carbon data management for Independent Data Services, which specializes in operational reporting and analytics. He is committed to developing efficiencies in the energy industry as the world heads toward net-zero by 2050. Shackleton holds a BS degree in physics from the University of Durham, a teaching degree from the University of Cambridge, and an MEd degree from Endicott College. He is on the Board of Directors of SPE Calgary and chairs the Petroleum Industry Data Exchange Business Processes Work Group, overseeing the Emissions Transparency Data Exchange, working to get carbon emissions into the supply chain. Shackleton also sits on the boards of the nonprofit YYC Data Society and the data science education startup Untapped Energy.
James Ng serves on SPE’s Calgary Data Science Engineering Analytics SIG committee and holds both scientific and nontechnical committee roles for SPE Calgary, the International Association of Drilling Contractors, the Open Source Drilling Community, and Untapped Energy. He is currently a senior research scientist for drilling automation and intelligence solutions at Pason Systems.