Prediction Rollouts#
Prediction Ingredients#
Before beginning rollouts of a CREDIT model, you will need the following ingredients/files available on your machine.
🌎Initial conditions for upper air and surface variables in Zarr format. If running processed ERA5 on Derecho or Casper, you can access the processed files at
/glade/campaign/cisl/aiml/credit/era5_zarr/. They_TOTAL*.zarrandSixHourly_y_TOTAL*.zarrare at 0.28 degree grid spacing, andSixHourly_y_ONEdeg*.zarrfor 1 degree data.🌞 Dynamic forcing files covering the full period of prediction. In the current CREDIT models, the only dynamic forcing variable is top-of-atmosphere shortwave irradiance. Pre-calculated solar irradiance values integrated over 1 and 6 hour periods are available on Derecho/Casper at
/glade/campaign/cisl/aiml/credit/credit_solar_nc_6h_0.25degandcredit_solar_nc_1h_0.25deg. You can calculate top of atmosphere solar irradiance for any grid and integration time period withapplications/calc_global_solar.py. If you plan to issue regular predictions, we recommend pre-computing solar irradiance values for a given year of inference rather than calculating on the fly.⛰️ Static forcing files with and without normalization. These forcing files include elements like terrain height, land-sea mask, and land-use type. Static forcing files for the initial CREDIT models are currently archived at
/glade/campaign/cisl/static_scalers/.static_norm_old.nchas normalized terrain height and land sea mask, while unnormalized values are inLSM_static_variables_ERA5_zhght.nc. The unnormalized values are needed for interpolation to pressure and height levels.Files containing the mean and standard deviation scaling values for each variable. Currently, CREDIT uses values stored in netCDF files. These are currently stored on Derecho in
/glade/campaign/cisl/static_scalers/. The appropriate files to use aremean_6h_1979_2018_16lev_0.25deg.ncfor the mean andstd_residual_6h_1979_2018_16lev_0.25deg.ncfor the combined standard deviation of each variable and the standard deviation of the temporal residual.
Realtime Rollouts#
The goal of realtime inference is to launch model forecasts from GFS or ERA5 initial conditions.
The predict section of your configuration file should contain the following fields:
predict:
mode: none
realtime:
forecast_start_time: "2025-04-14 12:00:00" # change to your init date
forecast_end_time: "2025-04-24 12:00:00" # Should be sometime after init date
forecast_timestep: "6h" # Needs to contain h for hours and should match 1 or 6 hour model.
initial_condition_path: "/path/to/gfs_init/" # change
static_fields: "/Users/dgagne/data/CREDIT_data/LSM_static_variables_ERA5_zhght.nc" # Static forcing file.
metadata: '/Users/dgagne/miles-credit/credit/metadata/era5.yaml' # Path to metadata for output
save_forecast: '/Users/dgagne/data/wxformer_6h_test/' # path to save forecast data
If you want to use GFS initial conditions, run python applications/gfs_init.py -c <config file>.
It will download fields from a GFS initial condition on model levels, which are archived for the past 10 days
on the NOAA NOMADS server. GDAS Analyses and
GFS initial timesteps on model levels are also available on
Google Cloud back to 2021.
The gfs_init.py program regrids the data onto the appropriate CREDIT grid and interpolates in
the vertical from the GFS to selected CREDIT ERA5 hybrid sigma-pressure levels.
[!IMPORTANT]
gfs_init.pyrequires xesmf, which depends on the ESMF suite and cannot be installed from PyPI. The easist way to install xesmf without messing up your CREDIT environment is to runconda install -c conda-forge esmfthenpip install xesmfafter building your CREDIT environment first.
Realtime rollouts are handled by applications/rollout_realtime.py. Update the paths in the
data section of the config file to point to the GFS initial conditions zarr file. rollout_realtime.py
only outputs one forecast at a time.
Rollout to netCDF for ERA5 initiated forecasts#
rollout_to_netcdf.py enables you to generate forecasts for many initialization times using
processed ERA5 data as initial conditions. It can be run either in serial or parallel mode with
both single node and multi node support (MPI-enabled PyTorch required).
To run rollout_to_netcdf.py include the following section in your config file.
predict:
mode: none
forecasts:
type: "custom" # keep it as "custom"
start_year: 2020 # year of the first initialization (where rollout will start)
start_month: 1 # month of the first initialization
start_day: 1 # day of the first initialization
start_hours: [0, 12] # hour-of-day for each initialization, 0 for 00Z, 12 for 12Z
duration: 32 # number of days to initialize, starting from the (year, mon, day) above
# duration should be divisible by the number of GPUs
# (e.g., duration: 384 for 365-day rollout using 32 GPUs)
days: 10 # forecast lead time as days (1 means 24-hour forecast)
To submit the rollout script as a PBS job, use python rollout_to_netcdf.py -l 1 -c <config file>.
To issue predictions on multiple GPUs on a single node:
torchrun rollout_to_netcdf.py -c <confg file>
For multi-node rollouts with MPI (MPI-enabled PyTorch required):
nodes=( $( cat $PBS_NODEFILE ) )
head_node=${nodes[0]}
head_node_ip=$(ssh $head_node hostname -i | awk '{print $1}')
export NUM_RANKS=32
MASTER_ADDR=$head_node_ip
MASTER_PORT=1234
mpiexec -n $NUM_RANKS -ppn 4 --cpu-bind none python rollout_to_netcdf.py -c <config file>
Interpolation to constant pressure and height above ground levels#
Both rollout_realtime.py and rollout_to_netcdf.py support vertical interpolation to constant
pressure and constant height above ground level (AGL) levels from the hybrid sigma-pressure levels
used by most models in CREDIT. To enable interpolation, add the following lines to your config
file in the predict section
data:
level_ids: [10, 30, 40, 50, 60, 70, 80, 90, 95, 100, 105, 110, 120, 130, 136, 137]
predict:
interp_pressure:
pressure_levels: [300.0, 500.0, 850.0, 925.0] # in hPa
height_levels: [100.0, 500.0, 1000.0, 2000.0, 3000.0, 4000.0, 5000.0, 6000.0] # in meters
More configuration options are listed in full_state_pressure_interpolation in credit/interp.py
and can be set from the config file in the interp_pressure section. The interpolation routine
interpolates to pressure levels using approximately the same approach as ECMWF, although results
will not be exactly the same due to slight numerical and implementation differences. The routine
also calculates pressure and geopotential on all levels. Mean sea level pressure is also calculated
in this routine.
Saving compressed and chunked netCDF files#
By default, the rollout scripts will save uncompressed netCDF files. These can grow to be quite large if you are producing a lot of forecasts and are saving all the fields. Space can be saved greatly by turning on netCDF compression and setting chunks that align with your preferred access pattern. Encoding options like the ones below go into the config file.
model:
# crossformer example
type: "crossformer"
frames: 1 # number of input states (default: 1)
image_height: &height 640 # number of latitude grids (default: 640)
image_width: &width 1280 # number of longitude grids (default: 1280)
levels: &levels 16 # number of upper-air variable levels (default: 15)
channels: 4 # upper-air variable channels
predict:
ua_var_encoding:
zlib: True # turns on zlib compression.
complevel: 1 # ranges from 1 to 9. 1 is faster with a lower compression ratio, 9 is slower.
shuffle: True
chunksizes: [1, *levels, *height, *width]
pressure_var_encoding:
zlib: True
complevel: 1
shuffle: True
chunksizes: [ 1, 4, *height, *width] # second dim should match number of interp pres. levels
height_var_encoding:
zlib: True
complevel: 1
shuffle: True
chunksizes: [ 1, 8, *height, *width] # second dim should match number of interp height levels
surface_var_encoding:
zlib: true
complevel: 1
shuffle: True
chunksizes: [1, *height, *width]