Geological hydrogen is emerging as a promising clean energy resource, but finding commercial quantities is challenging because of complex hydrogen production, migration, and accumulation dynamics in the subsurface. This study applies Monte Carlo simulation and an XGBoost regression model to assess the influence of various formations, geologic provinces, tectonic plate types, and boundary conditions on hydrogen concentrations. Key predictors identified include formation type, geological province, and proximity to province boundaries, highlighting the role of spatial relationships in hydrogen retention and potential lateral migration.
Data Set and Feature Engineering
Data Collection. The data set used in this study primarily focuses on free hydrogen occurrences in geological settings, comprising a total of 128 data points. This data set includes hydrogen percentages, associated formations, and observation locations.
Feature Engineering. First, the geological province type associated with each hydrogen discovery was identified, and the distance to the nearest geological province boundary was calculated using geodesic distance methods based on Vincenty’s formulae.