Unlocking ROI From AI: From Metrics to Execution in the Energy Industry

A discussion at the inaugural executive breakfast convened by the SPE Data Science and Engineering Analytics Technical Section, held alongside CERAWeek by S&P Global and powered by Black & Veatch, tackled the challenge of value creation from artificial intelligence in the energy industry.

May 11, 2026

Bo Hu, Scott Sanderson, Raghu Yabaluri, Prithvi Singh Chauhan, Sushma Bhan

Journal of Petroleum Technology

Artificial intelligence (AI) is producing real benefits across the energy industry, yet many organizations still struggle to translate pilots into measurable returns at the core of the business. The question is no longer whether AI works; it has become how companies define, deliver, and sustain return on investment (ROI) at scale.

Technical Section Editorial

JPT’s Technical Section Editorial series features insights from committee members across SPE’s technical sections. Articles examine technical priorities, key activities, and emerging challenges within specific disciplines, providing SPE members with clear insight into how industry experts and volunteers are helping define SPE’s technical direction. Collectively, the series reflects the depth of SPE’s technical community and its continued commitment to advancing knowledge-sharing across the upstream energy sector.

Learn more about the SPE Data Science and Engineering Analytics Technical Section on the DSEATS SPE Connect Page.

To address that question, the SPE Data Science and Engineering Analytics Technical Section (DSEATS) convened the inaugural “Unlocking ROI From AI” executive breakfast on 26 March at The Houston Club in downtown Houston. The invitation-only forum was held alongside CERAWeek by S&P Global and powered by Black & Veatch, bringing together 20 senior leaders from operators, service companies, and technology providers for a 2-hour, Chatham House Rule dialogue. The format was deliberately light on presentations and heavy on facilitated discussion, with the intent that every participant contribute and no single voice dominate.

The broader DSEATS Thought Leadership Series is built around the following five discussion pillars that span the full arc of AI value realization in energy:

Metrics that matter: realizing value at the core of the business
Seamless executive/engineering collaboration for ROI acceleration
Unlocking value in complex business models and systems
New ways of working (humans and AI): augmentation, accountability, and trust
Deployment, repeatability, and industry guidance: from pilots to scalable ROI

For this inaugural session, the organizers deliberately scoped the agenda to three of these pillars where the group had the strongest cross-industry gaps to close: metrics, scaling across complex systems, and the human trust equation. The remaining two pillars, executive/engineering collaboration and deployment with industry guidance, are reserved for subsequent regional forums already in planning.

Three discussion leaders anchored the conversation. Bo Hu, director of data science and AI at ConocoPhillips, led Round 1 on metrics. Scott Sanderson, director for corporate and strategic business development at AWS, led Round 2 on complex systems. Raghu Yabaluri, senior managing director at Black & Veatch, led Round 3 on humans and AI. What follows is a synthesis of the themes that emerged across the three rounds, anonymized per the Chatham House convention. Examples cited are illustrative of patterns that surfaced repeatedly across the room rather than the view of any single participant or organization.

Round 1: Metrics That Matter

The opening round, led by Hu, examined how organizations define, justify, and sustain the metrics used to demonstrate AI’s ROI. Discussion centered on the gap between project-level key performance indicators (KPIs) and the transformational business metrics that boards and financial officers actually track.

Connect AI Metrics to Core Business Outcomes. Participants stressed that AI metrics must be anchored to outcomes senior leadership already cares about, such as production volumes, cost per barrel, and capital efficiency, rather than isolated project savings. One participant described how a major operator attributed even a conservative 1% of production gains to its data and AI efforts, translating a relatively modest initiative investment into significant demonstrated value. The key was getting that attribution agreed upon up front with executive and asset leaders, so the number carried organizational credibility.

Business Case Accountability and Post-Mortems. Several voices highlighted that business cases almost always get approved but rarely face a post-mortem. A growing practice of multiyear retrospective tracking was discussed, not to create a “witch hunt” but to identify which metrics actually moved. One example cited a maintenance initiative that cut costs on paper but never reduced working capital in the field, exposing a cultural gap between the metric and the operational reality.

Top-Down vs. Bottom-Up Metric Framing. A senior participant drew a clear distinction between project-level metrics, such as reducing maintenance for a particular asset, and transformational metrics tied to the company’s core value chain: exploration inventory growth, development learning-rate acceleration, and production network optimization. The recommendation was to start with board-level business metrics and work backward to initiative-level KPIs, not the reverse. Quarterly moving-window comparisons were suggested as a way to show incremental progress while staying connected to long-term transformation goals.

Transparency Through Dashboards. Maintaining visible, continuously updated dashboards across assets and business units was cited as essential for sustained leadership buy-in. One operator shared how its CEO required an annual value report starting midyear, with every use-case leader accountable for quantifying their contribution, creating both transparency and healthy internal pressure.

Key Takeaways.

Agree on attribution methodology with business stakeholders before launching AI initiatives, not after.
Distinguish project-level efficiency gains from transformational metrics that move the needle at the enterprise level.
Institute post-mortem tracking: Revisit business cases annually to validate whether promised value materialized.
Keep metrics visible via dashboards. Sustained visibility prevents AI from becoming a “flavor of the month.”

Round 2: Unlocking Value in Complex Systems

Sanderson framed this round with Amazon’s own supply chain as an analogy. When a customer clicks “place order,” the system instantly runs 4,000–5,000 simulations to determine the optimal fulfillment path across a massive logistics network, continuously reoptimizing as conditions change. His challenge to the room was to bring this same thinking, continuous simulation, and continuous optimization into oil and gas operations.

Scale Is the Differentiator. The group was blunt: Optimizing a single compressor or electrical submersible pump is a “science fair project.” Value materializes only when innovations scale across thousands of wells or an entire production network. One participant cited an operator attributing 50-plus cents per barrel of benefit to network-level optimization across all its wells, the kind of number that gets mentioned on quarterly earnings calls. The consensus was direct: If the CFO cannot cite it, it has not scaled.

The Data Problem Is Solvable but Nuanced. A spirited debate emerged around data readiness. The technology perspective was direct: “Lack of data should not be a limiting factor” given today’s tooling for extraction, contextualization, and management. Operators pushed back with important nuance. Industrial systems are custom-built, each asset has a unique data architecture, and historical operating data often covers only a narrow range of conditions, creating a machine learning challenge where a team may have “1% of the data needed” to optimize across all possible future states. Resolution came through the idea of a decision-first approach: Identify the critical business decision, work backward to the minimal data required, and combine physics-based models with data-driven methods rather than trying to boil the ocean with terabytes of historical data.

Network Optimization Over Component Optimization. Multiple participants reinforced that the real opportunity lies in optimizing the full production network (wells, manifolds, compressor stations, pipelines, and facilities) rather than individual components. While local optimization is well understood, cross-asset and cross-geography optimization remains largely unrealized. The hybrid approach of combining domain physics with machine learning was endorsed as the practical path, particularly for older assets where retrofitting instrumentation represents a significant capital hurdle.

Key Takeaways.

Move from use cases to scaled business cases. Single-asset wins don’t move earnings calls.
Adopt a decision-first methodology: Define the decision then determine the minimum viable data set.
Combine physics-based models with data-driven approaches to overcome sparse historical operating data.
Data infrastructure gaps are solvable engineering problems, not permanent barriers to AI adoption.

Round 3: Humans and AI, Trust and Accountability

The final round, led by Yabaluri, tackled the human side: how to integrate AI into every node of the organization without creating fear, silos, or unaccountable automation.

Breaking Down Organizational Silos. A recurring theme was the three-layer disconnect in many companies. Leadership declares “we are AI-first,” a centralized digital team owns the tools, and frontline engineers continue their traditional workflows unchanged. Participants argued that transformation only happens when AI is embedded in the daily decision-making of every individual, not delegated to a separate team.

The Power User Model. The strongest consensus emerged around the “power user” concept: domain experts, such as reservoir engineers, chemical engineers, and mechanical engineers, who learn to apply AI tools to their discipline and can judge whether outputs are accurate or hallucinated. These individuals were described as “priceless.” The room agreed that the right model is to equip domain experts with data science skills, not the reverse. Pure data scientists placed into domain roles “completely fail” without deep operational context. A cautionary note was also raised about “citizen developers” who build models without proper statistical rigor. Data leakage and overfitting were cited as real risks that require formal engineering assurance processes before any model enters production.

The SME Pipeline Paradox. If AI replaces entry-level tasks, how does the next generation of subject-matter experts (SMEs) develop? This was flagged as a critical unresolved tension. Participants acknowledged that decisions must remain human-owned, because accountability cannot be delegated to algorithms, but the pathway to building the judgment required for that accountability is being disrupted. The analogy to the transition from typewriters to computers was drawn: Entire job categories were displaced, but the work became more complex, not simpler. Another angle raised in the room was that simulation tools will help bridge the SME gap and that training and future SME development should focus on stress-testing those tools rather than retreating from them.

Culture Eats Strategy. One operator’s experience with KPI-driven accountability was cited as a model. The executive issued scorecards to every business unit tying production outcomes to digital transformation metrics. Without top-level sponsorship setting clear expectations, political resistance and inertia will prevail. Multiple participants reinforced that scaling AI solutions across business units is blocked less by technical limitations than by cultural resistance (“Our data is different; your model doesn’t work for us.”), which is often valid for only 30% of requirements, while 70% could be addressed with shared solutions.

Key Takeaways.

Embed AI at every decision node. Don’t silo it in a “digital team.”
Invest in domain experts who adopt AI tools, not data scientists learning domain from scratch.
Establish engineering assurance processes to prevent poorly built models from entering production.
Address the SME pipeline paradox: If AI replaces junior tasks, design new pathways for building expert judgment.
Top-level sponsorship with visible accountability mechanisms is non-negotiable for cultural change.

Cross-Cutting Themes and What Comes Next

Three themes cut across all three rounds and deserve highlighting.

First, ROI from AI is a business problem before it is a technology problem. The room repeatedly returned to attribution, accountability, and alignment with executive metrics. None of these are engineering deliverables, and none can be outsourced to a centralized digital team.

Second, scale and integration are the forcing functions. A pilot that saves money on one well is easy to stand up and easy to forget. A portfolio that shows up on an earnings call requires network-level thinking, shared data foundations, and governance that holds up under audit. The cost of stopping at the pilot stage is not zero; it is the opportunity cost of the scaled value never captured.

Third, the human layer is where most programs stall. Power users, engineering assurance, and cultural sponsorship were raised more often and more forcefully than any algorithm or platform choice. The organizations that are pulling ahead are the ones investing in domain experts who carry AI fluency into their daily work, not the ones building parallel AI teams next to their operating teams.

The dialogue will continue through subsequent regional forums covering the two remaining pillars of the series: seamless executive/engineering collaboration for ROI acceleration and deployment, repeatability, and industry guidance from pilots to scalable ROI. Insights from this and subsequent sessions will inform industry-level guidance developed by SPE DSEATS, reinforcing the role of SPE as a neutral convener for shared frameworks across operators, service companies, and technology providers.

Acknowledgments
The authors thank Black & Veatch for powering the forum and the 20 participating senior leaders from operators, service companies, and technology providers who made the conversation possible.