
Weather forecasting is the process of predicting atmospheric conditions (temperature, precipitation, wind, humidity, etc.) over short to medium time scales, from hours to about two weeks ahead. It has evolved dramatically from rule-of-thumb methods to highly sophisticated computer models.
Modern forecasting relies primarily on Numerical Weather Prediction (NWP) — physics-based models that solve the fundamental equations of fluid dynamics, thermodynamics, and conservation laws on a global 3D grid. These models ingest vast amounts of real-time data from satellites, radars, weather stations, buoys, aircraft, and balloons.
AI models like GraphCast (Google DeepMind), Pangu-Weather, Fuxi, and ECMWF AIFS have transformed the field:
They are trained on decades of reanalysis data (e.g., ERA5).
Extremely fast and computationally cheap (often 1,000x faster than traditional NWP).
They match or outperform physics models on many average metrics, especially for large-scale patterns and routine forecasts.
Limitation: Pure AI models tend to underestimate record-breaking extremes (intense heatwaves, cold snaps, strong winds) because these events lie in the tails of the training data.
Weather forecasting remains a blend of science, data, and human expertise. While AI has accelerated progress, the most reliable systems in 2026 integrate physics foundations with machine learning. This hybrid era is delivering faster, more accurate, and more actionable forecasts — ultimately helping save lives and property.
A recent research confirms that physics-based (numerical weather prediction or NWP) models, such as ECMWF’s HRES, generally outperform current AI/ML weather models for predicting record-breaking extreme weather events.
The May 2026 study published in Science Advances directly compared the physics-based High RESolution forecast (HRES) from the European Centre for Medium-Range Weather Forecasts against leading AI models (GraphCast, GraphCast operational, Pangu-Weather, Pangu-Weather operational, and Fuxi). It analyzed thousands of record-breaking heats, cold, and wind events from 2018 and 2020.
For extremes: HRES showed consistently smaller forecast errors than the AI models across nearly all lead times for record-breaking events. AI models systematically underestimated the intensity and frequency of these extremes (the more extreme the record, the larger the underestimation).
Overall/average forecasts: AI models often match or exceed HRES in standard metrics (e.g., RMSE for temperature or wind) for typical conditions, and they are dramatically faster and cheaper to run.
AI models (e.g., graph neural networks like GraphCast) are trained on historical data (often reanalysis like ERA5). They excel at interpolating common patterns but are less reliable when extrapolating to unprecedented extremes (“gray swans” or records far outside the training distribution). Physics-based models solve the governing equations of fluid dynamics, thermodynamics, etc., so they better handle novel conditions as long as the physics approximations hold.
_____________________________________________________________________________________
Physics-based models outperform AI weather forecasts of record-breaking extremes
The recent peer-reviewed study published in Science Advances (April/May 2026) confirms that physics-based numerical weather prediction (NWP) models, particularly ECMWF’s High RESolution forecast (HRES), consistently outperform leading AI models in forecasting record-breaking extreme weather events.
Researchers analyzed thousands of record-breaking hot, cold, and high-wind events from 2018 and 2020. They compared HRES against state-of-the-art AI models including:
- GraphCast (and its operational version)
- Pangu-Weather (operational)
- Fuxi
For record-breaking extremes, HRES showed smaller forecast errors across nearly all lead times for temperature (heat/cold) and wind speed records.
AI models systematically underestimated the intensity and frequency of these extremes. The more extreme the record (larger deviation from past norms), the greater the underprediction, especially for hot records.
On average/typical conditions, AI models often match or beat HRES in standard metrics like RMSE, and they run orders of magnitude faster/cheaper.
AI models are trained primarily on historical reanalysis data (e.g., ERA5). They excel at interpolating common patterns but struggle with extrapolation to unprecedented “gray swan” or record events outside (or far in the tails of) their training distribution.
Physics-based models solve the underlying equations of atmospheric dynamics, thermodynamics, and conservation laws. They remain more robust for novel or extreme conditions, as long as the physical approximations and resolution are adequate.
This study serves as a timely caution against over-relying on AI alone for critical early warnings of disasters, where underestimating record events can have severe consequences. AI weather forecasting is advancing quickly, but physics-based foundations still provide superior reliability for the most impactful extremes as of 2026.
Published: Science Advances Vol 12, Issue 18
DOI: 10.1126/sciadv.aec1433
Authors: Zhongwei Zhang, Erich Fischer, Jakob Zscheischler, and Sebastian Engelke
Abstract
Artificial intelligence (AI)–based models are revolutionizing weather forecasting and have surpassed leading numerical weather prediction systems on various benchmark tasks. However, their ability to extrapolate and reliably forecast unprecedented extreme events remains unclear. Here, we show that for record-breaking weather extremes, the physics-based numerical model High RESolution forecast (HRES) from the European Centre for Medium-Range Weather Forecasts still consistently outperforms state-of-the-art AI models GraphCast, GraphCast operational, Pangu-Weather, Pangu-Weather operational, and Fuxi. We demonstrate that forecast errors in AI models are consistently larger for record-breaking heat, cold, and wind than in HRES across nearly all lead times. We further find that the examined AI models tend to underestimate both the frequency and intensity of record-breaking events, and they underpredict hot records and overestimate cold records with growing errors for larger record exceedance. Our findings underscore the current limitations of AI weather models in extrapolating beyond their training domain and in forecasting the potentially most impactful record-breaking weather events that are particularly frequent in a rapidly warming climate. Further rigorous verification and model development is needed before these models can be solely relied upon for high-stakes applications such as early warning systems and disaster management.
Discover more from Climate- Science.press
Subscribe to get the latest posts sent to your email.
