jennifer-h2 — a joint-embedding foundation model for the Earth's crust.
Beneath observable geophysical measurements lies a lower-dimensional manifold of latent geological processes. JENNIFER-H2 is a self-supervised, multi-modal foundation model that learns this manifold from petabyte-scale, real-world subsurface data — seismic volumes, borehole logs, gridded gravity and magnetics, heat flow, and free-text geological descriptions — and exposes it as a continuous embedding space.
Inspired by Joint Embedding Predictive Architectures (JEPA), the model is trained to predict the latent representation of masked subsurface regions rather than their raw pixel values. This yields stable, physically meaningful descriptors that survive acquisition noise and survey artifacts, and it enables probabilistic zero-shot inversion of subsurface properties — a task that is otherwise intractable.
The first benchmark application is natural ("white") hydrogen — a zero-carbon primary energy source generated continuously by serpentinization and radiolysis. Hydrogen exploration is chosen as the first benchmark precisely because it presents the greatest density of unsolved subsurface problems: sparse observations, ephemeral signals, and no established exploration workflow. Any model that performs well under these conditions will naturally generalize to better-constrained subsurface tasks — geothermal, critical minerals, carbon storage, conventional imaging.
two scientific objectives. framed around natural hydrogen and its subsurface expression.
asymmetric encoder–predictor. jepa-style training in latent space, not pixel space.
The encoder f_θ processes only the visible patches of a multi-modal input — colocated borehole logs, seismic traces, gridded gravity and magnetic anomalies, text descriptions. A lightweight predictor g_φ then predicts the latent representation of the masked regions, conditioned on the visible context and a small latent z capturing residual uncertainty.
The encoder is intentionally substantially larger than the predictor: at inference time, the encoder is what transfers to downstream tasks, while the predictor is discarded. This shape encourages the encoder to learn rich, transferable descriptors rather than survey-specific shortcuts.
Predicting in latent space — rather than reconstructing raw measurements — is the critical design choice. It avoids overfitting to acquisition noise, polarity conventions, and vintage-specific artifacts that are abundant in real subsurface data and would dominate any pixel-level reconstruction loss.