More Data, Less Clarity
Are we advancing understanding or just generating more data?

Evolution of climate model components toward fully coupled Earth system models
Climate models can now simulate the entire Earth system in extraordinary detail. They integrate atmospheric chemistry, ocean biogeochemistry, ice sheet dynamics, and land-surface processes into unified computational frameworks that would have been unimaginable a generation ago. But there is a paradox at the heart of this progress: the models are getting better at producing data, and we are not keeping pace in understanding what that data means.
What began as simulations of basic atmospheric and oceanic physics has transformed into Earth System Models (ESMs) that integrate complex biogeophysical and chemical processes. Advances in satellite remote sensing and supercomputing have made this leap possible, but they've also created an unexpected problem: the models now generate data faster than it can be meaningfully analyzed (Bordoni et al., 2025).
The productive balance between observation, theory, and modeling that has historically driven climate science forward is breaking down. Computational power has surged ahead. Theoretical understanding has not kept up. And the result is a growing gap between what we can simulate and what we actually comprehend about the climate system (Emanuel, 2020).
The Data Deluge Problem
The Coupled Model Intercomparison Project (CMIP) illustrates this tension clearly. CMIP is the largest coordinated effort in climate science: dozens of modeling centers around the world run their models under shared experimental protocols, producing a common archive of simulations that underpins most of what we know, and claim to know, about future climate change. IPCC assessments, national adaptation plans, and climate risk products all draw heavily on CMIP output.
The latest completed phase, CMIP6, produced petabytes of data. Future phases are projected to generate around 300 petabytes. Yet fundamental questions remain stubbornly unresolved.
CMIP6 models show higher climate sensitivity than their CMIP5 predecessors, and we still cannot fully explain why. Are these shifts genuine advances in representing physical processes like cloud feedbacks and aerosol interactions? Or are they artifacts of compensating errors that happen to produce different numbers? The sheer volume of output makes it harder, not easier, to diagnose where the physics diverges from reality.
Data management has become a discipline in its own right, necessary, but insufficient. Storing, indexing, and distributing petabytes of model output is an engineering achievement. It is not, by itself, a scientific one. And when data availability starts driving the research agenda, displacing hypothesis-driven inquiry with data-driven exploration, something important gets lost (Byrne et al., 2024).
What Theory Actually Buys You
Consider a concrete example. In tropical meteorology, decades of theoretical work on convective quasi-equilibrium and potential intensity theory gave researchers a framework for understanding why tropical cyclones intensify, what sets their upper bound, and how a warming climate might shift these limits. This theoretical scaffolding means that when a high-resolution model produces a hurricane simulation, scientists can evaluate whether the result is physically plausible, not just whether it looks realistic.
Without that kind of theoretical grounding, a model becomes an oracle: it produces answers, but no one can explain why those answers should be trusted. In climate science, where decisions hinge on projections decades into the future, this distinction matters enormously.
The point is not that computational modeling is misguided. It is that theory and modeling are complementary, and the balance between them has tilted too far in one direction. Maximizing the value of existing model outputs requires focused effort to diagnose model errors, identify which processes genuinely improve predictive skill, and separate robust signals from computational noise.
The Stakes for Climate Risk
This tension is not confined to academia. It plays out directly in the climate technology sector, where the same temptation exists: to assume that more data and higher resolution automatically produce better products.
They do not.
When models show diverging climate sensitivities and regional projections vary widely, the critical question is not which model has the finest grid spacing. It is which physical processes drive predictability in the variables that matter for real decisions, precipitation thresholds that trigger insurance payouts, temperature extremes that stress infrastructure, drought indices that shape agricultural planning.
A climate risk product that cannot explain the physical basis for its predictions is fundamentally limited, regardless of how much computation went into it. If two models disagree on future rainfall trends in a given region, a practitioner needs to understand why they disagree, which cloud parameterization differs, which land-surface feedback is handled differently, to make an informed judgment about which projection to trust.
This demands a kind of expertise that sits at the intersection of climate physics and applied data science. Neither discipline alone is sufficient. Climate scientists bring physical understanding of model behavior, data scientists bring the tools to extract signal from massive, noisy datasets. The competitive advantage belongs to organizations that can combine both.
The Window Is Narrowing
The window for climate action is not waiting for models to converge. Adaptation decisions, infrastructure investments, and risk assessments are being made now, on the basis of climate projections whose uncertainties are not always well characterized.
The path forward is not simply more computing power or bigger datasets. It is conceptual clarity: understanding climate physics well enough to know which predictions are robust, which uncertainties are reducible with better observations or theory, and which remain irreducible regardless of computational resources.
More data is not the bottleneck. Understanding is.
References
- Bordoni, S., Kang, S.M., Shaw, T.A. et al. The futures of climate modeling. npj Clim Atmos Sci 8, 99 (2025). https://doi.org/10.1038/s41612-025-00955-8
- Byrne, M.P., Hegerl, G.C., Scheff, J. et al. Theory and the future of land-climate science. Nat. Geosci. 17, 1079–1086 (2024). https://doi.org/10.1038/s41561-024-01553-8
- Emanuel, K. A. (2020). The relevance of theory for contemporary research in atmospheres, oceans, and climate. AGU Advances, 1(2), e2019AV000129. https://doi.org/10.1029/2019AV000129