Weather forecasting aims to predict atmospheric phenomena within a short time-frame, generally ranging from one to three days. This information is crucial for a multitude of sectors, including agriculture, transportation, and emergency management. Factors such as precipitation, temperature, and extreme weather events are of particular interest. Forecasting methods have evolved over the years, transitioning from traditional numerical methods to more advanced hybrid and machine-learning models. This section elucidates the working principles, methodologies, and merits and demerits of traditional numerical weather prediction models, MetNet, FourCastNet, and PanGu.
5.1. Model Design
Numerical Weather Model. Numerical Weather Prediction (NWP) stands as a cornerstone methodology in the realm of meteorological forecasting, fundamentally rooted in the simulation of atmospheric dynamics through intricate physical models. At the core of NWP lies a set of governing physical equations that encapsulate the holistic behaviour of the atmosphere:
The Navier-Stokes Equations [
74]: Serving as the quintessential descriptors of fluid motion, these equations delineate the fundamental mechanics underlying atmospheric flow.
The Thermodynamic Equations [
75]: These equations intricately interrelate the temperature, pressure, and humidity within the atmospheric matrix, offering insights into the state and transitions of atmospheric energy.
The model is fundamentally based on a set of time-dependent partial differential equations, which require sophisticated numerical techniques for solving. The resolution of these equations enables the simulation of the inherently dynamic atmosphere, serving as the cornerstone for accurate and predictive meteorological insights. Within this overarching framework, a suite of integral components is embedded to address specific physical interactions that occur at different resolutions, such as cloud formation, radiation, convection, boundary layers, and surface interactions. Each of these components serves a pivotal role:
The Cloud Microphysics Parameterization Scheme is instrumental for simulating the life cycles of cloud droplets and ice crystals, thereby affecting [
78,
79] and the atmospheric energy balance.
Shortwave and Longwave Radiation Transfer Equations elucidate the absorption, scattering, and emission of both solar and terrestrial radiation, which in turn influence atmospheric temperature and dynamics.
Empirical or Semi-empirical Convection Parameterization Schemes simulate vertical atmospheric motions initiated by local instabilities, facilitating the capture of weather phenomena like thunderstorms.
Boundary-Layer Dynamics concentrate on the exchanges of momentum, energy, and matter between the Earth’s surface and the atmosphere, which are crucial for the accurate representation of surface conditions in the model.
Land Surface and Soil/Ocean Interaction Modules simulate the exchange of energy, moisture, and momentum between the surface and the atmosphere, while also accounting for terrestrial and aquatic influences on atmospheric conditions.
These components are tightly coupled with the core atmospheric dynamics equations, collectively constituting a comprehensive, multi-scale framework. This intricate integration allows for the simulation of the complex dynamical evolution inherent to the atmosphere, contributing to more reliable and precise weather forecasting.
In Numerical Weather Prediction (NWP), a critical tool for atmospheric dynamics forecasting, the process begins with data assimilation, where observational data is integrated into the model to reflect current conditions. This is followed by numerical integration, where governing equations are meticulously solved to simulate atmospheric changes over time. However, certain phenomena, like microphysics of clouds, cannot be directly resolved and are accounted for through parameterization to approximate their aggregate effects. Finally, post-processing methods are used to reconcile potential discrepancies between model predictions and real-world observations, ensuring accurate and reliable forecasts. This comprehensive process captures the complexity of weather systems and serves as a robust method for weather prediction [
80]. While the sophistication of NWP allows for detailed simulations of global atmospheric states, one cannot overlook the intensive computational requirements of such models. Even with the formidable processing capabilities of contemporary supercomputers, a ten-day forecast simulation can necessitate several hours of computational engagement.
MetNet. MetNet [
47] is a state-of-the-art weather forecasting model that integrates the functionality of CNN, LSTM, and auto-encoder units. The CNN component conducts a multi-scale spatial analysis, extracting and abstracting meteorological patterns across various spatial resolutions. In parallel, the LSTM component captures temporal dependencies within the meteorological data, providing an in-depth understanding of weather transitions over time [
43]. Autoencoders are mainly used in weather prediction for data preprocessing, feature engineering and dimensionality reduction to assist more complex prediction models in making more accurate and efficient predictions.This combined architecture permits a dynamic and robust framework that can adaptively focus on key features in both spatial and temporal dimensions, guided by an embedded attention mechanism [
81,
82].
MetNet is consist of three core components: Spatial Downsampler, Temporal Encoder (ConvLSTM), and Spatial Aggregator. In this architecture, the Spatial Downsampler acts as an efficient encoder that specializes in transforming complex, high-dimensional raw data into a more compact, low-dimensional, information-intensive form. This process helps in feature extraction and data compression. The Temporal Encoder, using the ConvLSTM (Convolutional Long Short-Term Memory) model, is responsible for processing this dimensionality-reduced data in the temporal dimension. One of the major highlights of ConvLSTM is that it combines the advantages of CNNs and LSTM. The advantage of ConvLSTM is that it combines the advantages of CNN and LSTM, and is able to consider the localization of space in time series analysis simultaneously, increasing the model’s ability to perceive complex time and space dependencies. The Spatial Aggregator plays the role of an optimized, high-level decoder. Rather than simply recovering the raw data from its compressed form, it performs deeper aggregation and interpretation of global and local information through a series of axial self-attentive blocks, thus enabling the model to make more accurate weather predictions. These three components work in concert with each other to form a powerful and flexible forecasting model that is particularly well suited to handle meteorological data with a high degree of spatio-temporal complexity.
The operational workflow of MetNet begins with the preprocessing of atmospheric input data, such as satellite imagery and radar information [
83]. Spatial features are then discerned through the CNN layers, while temporal correlations are decoded via the LSTM units. This information is synthesized, with the attention mechanism strategically emphasizing critical regions and timeframes, leading to short-term weather forecasts ranging from 2 to 12 hours [
82]. MetNet’s strength lies in its precise and adaptive meteorological predictions, blending spatial and temporal intricacies, and thus offers an indispensable tool for refined weather analysis [
47].
Figure 2.
MetNet Structure.
Figure 2.
MetNet Structure.
FourCastNet. In response to the escalating challenges posed by global climate changes and the increasing frequency of extreme weather phenomena, the demand for precise and prompt weather forecasting has surged. High-resolution weather models serve as pivotal instruments in addressing this exigency, offering the ability to capture finer meteorological features, thereby rendering more accurate predictions [
84,
85]. Against this backdrop, FourCastNet [
48] has been conceived, employing ERA5, an atmospheric reanalysis dataset. This dataset is the outcome of a Bayesian estimation process known as data assimilation, fusing observational results with numerical models’ output [
87]. FourCastNet leverages the Adaptive Fourier Neural Operator (AFNO), uniquely crafted for high-resolution inputs, incorporating several significant strides within the domain of deep learning.
The essence of AFNO resides in its symbiotic fusion of the Fourier Neural Operator (FNO) learning strategy with the self-attention mechanism intrinsic to Vision Transformers (ViT) [
88]. While FNO, through Fourier transforms, adeptly processes periodic data and has proven efficacy in modeling complex systems of partial differential equations, the computational complexity for high-resolution inputs is prohibitive. Consequently, AFNO deploys the Fast Fourier Transform (FFT) in the Fourier domain, facilitating continuous global convolution. This innovation reduces the complexity of spatial mixing to
, thus rendering it suitable for high-resolution data [
86]. The workflow of AFNO encompasses data preprocessing, feature extraction with FNO, feature processing with ViT, spatial mixing for feature fusion, culminating in prediction output, representing future meteorological conditions such as temperature, pressure, and humidity.
Tailoring AFNO for weather prediction, FourCastNet introduces specific adaptations. Given its distinct application scenario—predicting atmospheric variables utilizing the ERA5 dataset—a dedicated precipitation model is integrated into FourCastNet, predicting six-hour accumulated total precipitation [
87]. Moreover, the training paradigm of FourCastNet includes both pre-training and fine-tuning stages. The former learns the mapping from weather state at one time point to the next, while the latter forecasts two consecutive time steps. The advantages of FourCastNet are manifested in its unparalleled speed—approximately 45,000 times swifter than conventional NWP models—and remarkable energy efficiency—consuming about 12,000 times less energy compared to the IFS model [
88]. The model’s architectural innovations and its efficient utilization of computational resources position it at the forefront of high-resolution weather modeling.
Figure 3.
(a) The multi-layer transformer architecture (b) two-step fine-tuning (c) backbone model (d) forecast model in free-running autoregressive inference mode.
Figure 3.
(a) The multi-layer transformer architecture (b) two-step fine-tuning (c) backbone model (d) forecast model in free-running autoregressive inference mode.
GraphCast.GraphCast represents a notable advance in weather forecasting, melding machine learning with complex dynamical system modelling to pave the way for more accurate and efficient predictions. It leverages machine learning to model complex dynamical systems and showcases the potential of machine learning in this domain. It’s an autoregressive model, built upon graph neural networks (GNNs) and a novel multi-scale mesh representation, trained on historical weather data from the European Centre for Medium-Range Weather Forecasts (ECMWF)’s ERA5 reanalysis archive.
The structure of GraphCast employs an "encode-process-decode" configuration utilizing GNNs to autoregressively generate forecast trajectories. In detail:
Encoder: The encoder component maps the local region of the input data (on the original latitude-longitude grid) onto the nodes of the multigrid graphical representation. It maps two consecutive input frames of the latitude-longitude input grid, with numerous variables per grid point, into a multi-scale internal mesh representation. This mapping process helps the model better capture and understand spatial dependencies in the data, allowing for more accurate predictions of future weather conditions.
Processor: This part performs several rounds of message-passing on the multi-mesh, where the edges can span short or long ranges, facilitating efficient communication without necessitating an explicit hierarchy. More specifically, the section uses a multi-mesh graph representation. It refers to a special graph structure that is able to represent the spatial structure of the Earth’s surface in an efficient way. In a multi-mesh graph representation, nodes may represent specific regions of the Earth’s surface, while edges may represent spatial relationships between these regions. In this way, models can capture spatial dependencies on a global scale and are able to utilize the power of GNNs to analyze and predict weather changes.
Decoder: It then maps the multi-mesh representation back to the latitude-longitude grid as a prediction for the next time step.
Figure 4.
(a) The encoder component of the GraphCast architecture maps the input local regions (green boxes) to the nodes of the multigrid graph.(b) The processor component uses learned message passing to update each multigrid node. (c) The decoder component maps the processed multigrid features (purple nodes) to the grid representation. (d) A multi-scale grid set.
Figure 4.
(a) The encoder component of the GraphCast architecture maps the input local regions (green boxes) to the nodes of the multigrid graph.(b) The processor component uses learned message passing to update each multigrid node. (c) The decoder component maps the processed multigrid features (purple nodes) to the grid representation. (d) A multi-scale grid set.
The workflow of GraphCast begins with the input of weather state(s) defined on a high-resolution latitude-longitude-pressure-levels grid. The encoder processes these inputs into a multi-scale internal mesh representation, which then undergoes many rounds of message-passing in the processor to capture spatio-temporal relationships in the weather data. Finally, the decoder translates the multi-mesh representation back to the latitude-longitude grid to generate predictions for subsequent time steps. It is worth noting that, as shown in the d part, due to the multi-scale mesh mapping property, the model is able to capture both localized weather features on a high-resolution mesh and large-scale weather features on a low-resolution mesh at the same time.
In essence, GraphCast encapsulates a pioneering stride in enhancing weather forecasting accuracy and efficiency through the amalgamation of machine learning and complex dynamical system modelling. It uniquely employs an autoregressive model structure underpinned by graph neural networks and a multi-scale mesh representation. The model’s "encode-process-decode" configuration, executed through a novel multi-mesh graphical representation, adeptly captures spatial dependencies and facilitates global-scale weather prediction. By processing high-resolution weather data inputs through a systematic workflow of encoding, message-passing, and decoding, GraphCast not only generates precise weather predictions for subsequent time intervals but also exemplifies the profound potential of machine learning in advancing meteorological forecasting methodologies.
PanGu. In the rapidly evolving field of meteorological forecasting, PanGu emerges as a pioneering model, predicated on a three-dimensional neural network that transcends traditional boundaries of latitude and longitude. Recognizing the intrinsic relationship between meteorological data and atmospheric pressure, PanGu incorporates a neural network structure that accounts for altitude in addition to latitude and longitude. The initiation of the PanGu model’s process involves Block Embedding, where the dataset is parsed into smaller subsets or blocks. This operation not only mitigates spatial resolution and complexity but also facilitates the subsequent data management within the network.
Following block embedding, the PanGu model integrates the data blocks into a 3D cube through a process known as 3D Cube Fusion, thereby enabling data processing within a tri-dimensional space. Swin Encoding [
90], a specialized Transformer encoder utilized in the deep learning spectrum, applies a self-attention mechanism for data comprehension and processing. This encoder, akin to the Autoencoder, excels in extracting and encoding essential information from the dataset. The ensuing phases include Decoding, which strives to unearth salient information, and Output Splitting, which partitions data into atmospheric and surface variables. Finally, Resolution Restoration reinstates the data to its original spatial resolution, making it amenable for further scrutiny and interpretation.
PanGu [
50]’s innovative 3D neural network architecture [
91] offers a groundbreaking perspective for integrating meteorological data, and its suitability for three-dimensional data is distinctly pronounced. Moreover, PanGu introduces a hierarchical time-aggregation strategy, an advancement that ensures the network with the maximum lead time is consistently invoked, thereby curtailing errors. In juxtaposition with running a model like FourCastNet [
48] multiple times, which may accrue errors, this approach exhibits superiority in both speed and precision. Collectively, these novel attributes and methodological advancements position PanGu as a cutting-edge tool in the domain of high-resolution weather modeling, promising transformative potential in weather analysis and forecasting.
Figure 5.
Network training and inference strategies. a. 3DEST architecture. b. Hierarchical temporal aggregation. We use FM1, FM3, FM6 and FM24 to indicate the forecast models with lead times being 1 h, 3 h, 6 h or 24 h, respectively.
Figure 5.
Network training and inference strategies. a. 3DEST architecture. b. Hierarchical temporal aggregation. We use FM1, FM3, FM6 and FM24 to indicate the forecast models with lead times being 1 h, 3 h, 6 h or 24 h, respectively.
MetNet, FourCastNet, GraphCast and PanGu are state-of-the-art methods in the field of weather prediction, and they share some architectural similarities that can indicate converging trends in this field. All four models initiate the process by embedding or downsampling the input data. FourCastNet uses AFNO, MetNet employs a Spatial Downsampler, and PanGu uses Block Embedding to manage the spatial resolution and complexity of the datasets, while GraphCast maps the input data from the original latitude-longitude grid into a multi-scale internal mesh representation. Spatio-temporal coding is an integral part of all networks; FourCastNet uses pre-training and fine-tuning phases to deal with temporal dependencies, MetNet uses ConvLSTM, and PanGu introduces a hierarchical temporal aggregation strategy to manage temporal correlations in the data, while GraphCast employs GNNs to capture and address spatio-temporal dependencies in weather data. Each model employs a specialized approach to understand the spatial relationships within the data. FourCastNet uses AFNO along with Vision Transformers, MetNet utilizes Spatial Aggregator blocks, and PanGu integrates data into a 3D cube via 3D Cube Fusion, while GraphCast translates data into multi-scale internal mesh. Both FourCastNet and PanGu employ self-attention mechanisms, derived from the Transformer architecture, for better capturing long-range dependencies in the data. FourCastNet combines FNO with ViT, andPanGu uses Swin Encoding.
5.2. Result Analysis
MetNet: According to MetNet experiment part,at the threshold of 1 millimeter/hour precipitation rate, both MetNet and NWP predictions have high similarity to ground conditions.Evidently, MetNet exhibits a forecasting capability that is commensurate with NWP, distinguished by an accelerated computational proficiency that generally surpasses NWP’s processing speed.
FourCastNet: According to FourCastNet experiment, FourCastNet can predict wind speed 96 hours in advance, with extremely high fidelity and accurate fine-scale features. In the experiment, the FourCastNet forecast accurately captured the formation and path of the super typhoon Shanzhu, as well as its intensity and trajectory over four days. It also has a high resolution and demonstrates excellent skills in capturing small-scale features. Particularly noteworthy is the performance of FourcastNet, in forecasting meteorological phenomena within a 48-hour horizon, has transcended the predictive accuracy intrinsic to conventional numerical weather forecasting methodologies. This constitutes a significant stride in enhancing the veracity and responsiveness of short-term meteorological projections.
GraphCast: According to the GraphCast experiment, GraphCast demonstrates superior performance in tracking weather patterns, substantially outperforming NWP in various forecasting horizons, notably from 18 hours to 4.75 days, as depicted in
Figure 3b. It excels in predicting atmospheric river behaviours and extreme climatic events, with significant improvement seen in longer-term forecasts of 5 and 10 days. The model’s prowess extends to accurately capturing extreme heat and cold anomalies, showcasing not just its forecasting capability, but a nuanced understanding of meteorological dynamics, thereby holding promise for more precise weather predictions with contemporary data.
PanGu: According to PanGu experiment, PanGu can almost accurately predict typhoon trajectories during the tracking of strong tropical cyclones Kong Lei and Yu Tu, and is 48 hours faster than NWP.The advent of 3D Net further heralds a momentous advancement in weather prediction technology. This cutting-edge model outperforms numerical weather prediction models by a substantial margin and possesses the unprecedented ability to replicate reality with exceptional fidelity. It’s not merely a forecasting tool but a near-precise reflection of meteorological dynamics, allowing for a nearly flawless reconstruction of real-world weather scenarios.
In
Table 4, "Forecast-timeliness" represents the forecasting horizon of each model, indicating their ability to predict weather up to certain future days. In meteorology, z500 refers to the height at the 500 hPa isobaric level, critical for understanding atmospheric structures and weather systems. Model evaluation often employs RMSE (Root Mean Square Error) and ACC (Anomaly Correlation Coefficient) to gauge prediction accuracy and correlation with actual observations. Lower RMSE and higher ACC values indicate better model performance. Among GraphCast, PanGu, and IFS, PanGu exhibits the highest accuracy with an ACC of 0.872 for a 7-day forecast timeliness. GraphCast, while having a longer forecast timeliness of 9.75 days, has an ACC of 0.825 and an RMSE of 460, showing a balance between a longer forecasting duration and decent accuracy. Apart from this, Introducing GPU data and prediction speed can provide crucial reference information for model selection, especially in scenarios with limited resources or where rapid responses are required. This aids in finding a balance between efficiency and effectiveness, offering support for successful forecasting.