What’s in the Configuration File?

Contents

What’s in the Configuration File?#

Your configuration file drives everything from the model training to inference, to creating validation runs. This page documents the possible config options and the what the flags / settings do.

CREDIT Configuration Guide#

Overview#

This document provides detailed instructions on configuring configuration.yml for running CREDIT.

Key Topics Covered:

  • Understanding and modifying configuration.yml

  • Standard Configuration Values and Recommendations

  • Best Practices and troubleshooting

Summary tables are included at the end of each subsection.


General Setup#

Workspace Configuration#

The following settings define where CREDIT will store output files:

save_loc: '/path/to/workspace/'
seed: 1000
  • save_loc: Directory where model weights, logs, and scripts are stored. If it doesn’t exist, CREDIT will create it automatically. The models weights can be large, so make sure ample storage is available.

  • seed: Random seed for reproducibility. Changing this affects experiment results.


Data Configuration#

CREDIT requires multiple types of atmospheric data, formatted in YEARLY .nc or .zarr files, the following variables can be contained within the same, or different files.

Upper-Air Variables#

Upper-Air variables are those which have either pressure or model levels. These variables are considered prognostic (input + output) and have an expected format which covers whole spatial domain and model levels.

variables: ['U', 'V', 'T', 'Q']
save_loc: '/path/to/upper_air_data/'
  • Expected format: (time, level, latitude, longitude)

  • Normalization: Handled automatically by the dataloader—no need to preprocess.

Surface Variables#

Despite being named ‘surface variables’ these are prognostic variables (input &output) that are on single levels, either surface, top-of-model, or somewhere in the middle.

surface_variables: ['SP', 't2m', 'Z500', 'T500', 'U500', 'V500', 'Q500']
save_loc_surface: '/path/to/surface_data/'
  • Expected format: (time, latitude, longitude)

  • Must align with upper-air variable timestamps.

Forcing & Diagnostic Variables#

dynamic_forcing_variables: ['tsi','sst']
save_loc_dynamic_forcing: '/path/to/dynamic_forcing_data/'

diagnostic_variables: ['Z500', 'T500', 'U500', 'V500', 'Q500']
save_loc_diagnostic: '/path/to/diagnostic_data/'
  • Dynamic forcing variables provide additional time-dependent factors (e.g., solar forcing or SST forcing), these are dynamic (changing in time) variables provided during run time.

  • Diagnostic variables are used for evaluation but not directly predicted by the model.

Periodic & Static Forcing#

forcing_variables: ['TSI', 'SST']
save_loc_forcing: '/path/to/forcing_data.nc'

static_variables: ['Z_GDS4_SFC', 'LSM']
save_loc_static: '/path/to/static_data.nc'
  • Periodic forcing: Should cover an entire leap year (e.g., 366 days for an hourly model).

  • Static variables: Must be normalized by the user before use.


Physics and Normalization Files#

CREDIT requires external reference files for conservation physics and data normalization. These files must be provided in .zarr or .nc format.

Physics File: save_loc_physics#

save_loc_physics: '/path/to/physics_data.zarr'
  • Purpose: Stores grid information and coefficients needed for enforcing conservation constraints in the post-processing step (post_block).

  • Required for:

    • Mass conservation (global_mass_fixer)

    • Water conservation (global_water_fixer)

    • Energy conservation (global_energy_fixer)

  • Must include the following variables:

    • For pressure-level grids: lon2d, lat2d (longitude/latitude coordinates).

    • For hybrid sigma-pressure grids: lon2d, lat2d, coef_a, coef_b (sigma coordinate coefficients).

💡 If conservation constraints (post_conf) are enabled, this file is required!


Normalization Files#

CREDIT uses z-score normalization to standardize input variables. The mean and standard deviation files must contain all variables used in the model (upper-air, surface, forcing, diagnostic).

mean_path: '/path/to/mean.nc'
std_path: '/path/to/std.nc'
  • mean_path: NetCDF/Zarr file containing mean values for all variables.

  • std_path: NetCDF/Zarr file containing standard deviation values.

Expected Format#

Both mean_path and std_path should store 1D variables indexed by level:

Variable Type

Expected Dimensions

Example Variables

Upper-Air

(level,)

U, V, T, Q

Surface

()

SP, t2m

Forcing

()

TSI

Diagnostics

()

Z500, T500

💡 Ensure these files contain ALL variables listed in the configuration.yml sections for variables, surface_variables, dynamic_forcing_variables, and diagnostic_variables.


Summary of Key Physics & Normalization Recommendations#

Parameter

Required For

Notes

save_loc_physics

Conservation constraints (post_conf)

Required if conservation physics is enabled.

mean_path

scaler_type: 'std_new'

Required for z-score normalization.

std_path

scaler_type: 'std_new'

Must include all model variables.

Training Data Selection#

train_years: [1979, 2014]  # 1979 - 2013
valid_years: [2014, 2018]  # 2014 - 2017
  • Defines training/validation split. Adjust these to match the dataset.


This section contains critical configuration parameters related to data preprocessing, input structure, and training behavior. Below is an expanded, structured section covering these settings in depth.


Data Preprocessing and Temporal Configuration#

CREDIT supports different data normalization workflows, input histories, and forecast strategies. These settings control how data is preprocessed, how the model receives historical context, and how it is trained to predict future states.

Normalization: scaler_type#

scaler_type: 'std_new'  # Options: 'std_new', 'std_cached'
  • std_new: The recommended approach. Uses z-score normalization with precomputed means and standard deviations from training data.

  • std_cached: Assumes data has already been pre-normalized (e.g., stored in a cached dataset). Use only when working with preprocessed inputs.


Historical Context: history_len#

history_len: 1
valid_history_len: 1
  • history_len: Number of time steps used as input during training.

  • valid_history_len: Same as history_len; modifying this separately is not recommended.

💡 For example, if history_len: 4, the model will use the last 4 time steps to predict the next state.


Forecast Lead Time Configuration#

CREDIT can be trained in single-step or multi-step forecasting mode:

forecast_len: 0
valid_forecast_len: 0
  • forecast_len:

    • 0 → Single-step prediction (predicts only the next time step).

    • 1, 2, 3, ... → Multi-step prediction (predicts several time steps ahead).

  • valid_forecast_len:

    • Can be equal to or smaller than forecast_len.

    • If forecast_len > 1, setting a smaller valid_forecast_len allows shorter validation sequences (useful for debugging).


Multi-Step Training Options#

If forecast_len > 0, CREDIT supports customized backpropagation strategies to improve training efficiency.

backprop_on_timestep: [1, 2, 3, 5, 6, 7]
  • Specifies which time steps contribute to the loss during backpropagation.

  • If unspecified, the trainer will backpropagate on all timesteps

  • Helps control memory usage by skipping certain time steps.

💡 For example, [1, 2, 3, 5, 6, 7] means the model backpropagates on these timesteps but skips others.

retain_graph: False
  • Specifies whether the trainer keeps the computation graph through the autoregressive prediction during training

  • If so, the backpropagation will go from each backprop_on_timestep to the start of the autoregressive rollout

  • Will use a lot more memory

One-Shot Loss Computation#

one_shot: False
  • True: Computes loss only on the final predicted time step (useful for speeding up multi-step training).

  • False: Computes loss at every time step, which may improve stability.


Temporal Resolution and Data Alignment#

CREDIT supports models trained on different time step intervals:

lead_time_periods: 6  # Example: 6-hourly training data
  • Controls the time step between consecutive forecast states.

    • 6 → 6-hourly model (common for ERA5).

    • 1 → Hourly model.


Input Data Ordering: static_first#

CREDIT provides flexibility in how input tensors are structured:

static_first: False
  • True → Order: [static dynamic forcing periodic forcing] (matches older std workflow).

  • False → Order: [dynamic forcing periodic forcing static] (recommended for std_new).

💡 If you are using std_new, set static_first: False.


Dataset Type#

CREDIT supports multiple data loading strategies:

dataset_type: ERA5_MultiStep_Batcher
  • Options:

    • ERA5_MultiStep_Batcher

    • ERA5_and_Forcing_MultiStep

    • ERA5_and_Forcing_SingleStep

    • Ocean_Tensor_Batcher

    • Ocean_MultiStep_Batcher

  • The default (ERA5_MultiStep_Batcher) is recommended for efficient parallel data loading.


Summary of Key Data Processing Recommendations#

Parameter

Recommended Setting

Notes

scaler_type

'std_new'

Ensures data is properly normalized.

history_len

>= 1

Use longer history for improved forecasts.

forecast_len

0 (single-step) or >0 (multi-step)

Multi-step training requires additional tuning.

backprop_on_timestep

[1, 2, 3, 5, 6, 7] (example)

Skipping some timesteps helps manage memory.

one_shot

False

Set True for faster multi-step training.

lead_time_periods

6 (for ERA5)

Controls forecast step size.

static_first

False

Recommended for std_new.

dataset_type

MultiprocessingBatcherPrefetch

Optimized for performance.

Training Configuration#

The trainer section controls how CREDIT handles GPU parallelism, gradient updates, checkpointing, and logging.

Training type and mode#

trainer:
    type: era5-gen1 # era5-gen1, era5-gen2, or conus404
    mode: none  # Options: "none" (single GPU), "fsdp" (fully sharded), "ddp" (distributed)
  • Use era5 for global data

  • Use fsdp or ddp for multi-GPU training.

💡 For large models, fsdp helps distribute computation across multiple GPUs, reducing memory usage.

CREDIT supports single-GPU, multi-GPU, and distributed training.#

FSDP-Specific GPU Optimization#

If using fsdp, you can enable additional optimizations:

cpu_offload: False
activation_checkpoint: True
checkpoint_all_layers: False
  • cpu_offload: Moves gradients to CPU memory (frees GPU memory but can cause CPU OOM errors).

  • activation_checkpoint: Saves activations in forward pass (reduces GPU memory but slows training).

  • checkpoint_all_layers:

    • True → Checkpoints activations for all layers.

    • False → Uses custom layer-wise checkpointing (set in credit/distributed.py).

💡 Use activation_checkpoint: True if training large models on limited memory GPUs.


Torch Compilation#

Torch 2.0 introduces compiling to torchscript to speed up training.

compile: False
  • True → Enables torch.compile() (can improve performance).

  • False → Default setting (recommended for maximum compatibility).

💡 Setting compile: True may break custom models—test before enabling.


Checkpointing & Weight Management#

CREDIT automatically saves and reloads model states. It will warn you if you are trying to load a model when no weights are available. To continue a run (or to extend the multi-step training), it is crucial to set the weight-loading to True.

load_weights: True
load_optimizer: True
load_scaler: True
load_scheduler: True
  • load_weights → Loads existing model weights.

  • load_optimizer → Restores optimizer state (needed for resuming training).

  • load_scaler → Loads mixed-precision gradient scaler (if using AMP).

  • load_scheduler → Restores learning rate scheduler state.

💡 When starting multi-step training, initially set only load_weights: True, then enable all options for full restoration.

Saving Checkpoints#

save_backup_weights: True
save_best_weights: True
  • save_backup_weights → Saves a checkpoint at the start of every epoch (acts as a recovery point).

  • save_best_weights → Saves the best model based on validation loss.

💡 If skip_validation: True, save_best_weights will NOT work!


Logging & Training Metrics#

CREDIT logs training performance in training_log.csv.

save_metric_vars: True
  • True → Saves metrics for all predicted variables.

  • List of variables → Saves only the specified ones:

    save_metric_vars: ["Z500", "Q500", "Q", "T"]
    
  • [] or None → Saves only bulk metrics (averaged over all variables).

💡 Reducing the number of tracked variables speeds up training logs.


Learning Rate Updates#

update_learning_rate: False
  • False → Learning rate is controlled by the scheduler.

  • True → Manually updates optimizer.param_groups.

💡 Set this to False if you are using a scheduler!


Summary of Key Hardware Utilization Recommendations#

Parameter

Recommended Setting

Notes

mode

"fsdp" (for multi-GPU)

"ddp" for simpler parallel training.

cpu_offload

False

Saves GPU memory but can cause CPU OOM errors.

activation_checkpoint

True

Saves memory but slows training.

checkpoint_all_layers

False

Use custom layer-wise checkpointing.

compile

False

Test before enabling (True can break custom models).

save_backup_weights

True

Creates a checkpoint every epoch.

save_best_weights

True

Saves best validation checkpoint (requires skip_validation: False).

save_metric_vars

True (or specify variables)

Controls what gets logged.

update_learning_rate

False

Disable if using a scheduler.

Learning Rate & Optimization#

learning_rate: 1.0e-03
use_scheduler: False
  • Set use_scheduler: True to enable learning rate decay.

Regularization & Weight Decay#

weight_decay: 0
  • L2 regularization: Helps prevent overfitting by penalizing large weights.

  • 0 → Turns off regularization.

  • Typical values: 1e-5 to 1e-3 (increase for stronger regularization).

💡 If training a very deep model, try weight_decay: 1e-4 to reduce overfitting.


Batch Size Configuration#

train_batch_size: 1
valid_batch_size: 1
ensemble_size: 1
  • train_batch_size: Number of samples per training batch.

  • valid_batch_size: Number of samples per validation batch.

  • ensemble_size: Controls stochastic ensemble training (default = 1, meaning deterministic behavior).

💡 For multi-GPU training (fsdp or ddp), the effective batch size = train_batch_size × num_GPUs.


Number of Batches Per Epoch#

batches_per_epoch: 1000
valid_batches_per_epoch: 20
  • batches_per_epoch:

    • 0 → Uses the full dataset.

    • Custom value (e.g., 1000) → Limits the number of training batches per epoch.

  • valid_batches_per_epoch: Controls how many validation batches run per epoch.

💡 Reducing batches_per_epoch helps debug faster before full-scale training.


Early Stopping & Validation Skipping#

stopping_patience: 50
skip_validation: False
  • stopping_patience: Stops training if validation loss does not improve for N epochs.

  • skip_validation:

    • True → Always saves weights, but does NOT run validation.

    • False → Runs validation before saving checkpoints.

💡 If skip_validation: True, save_best_weights will not work.


Epoch & Checkpoint Management#

start_epoch: 0
num_epoch: 10
reload_epoch: True
epochs: &epochs 70
  • start_epoch: First epoch (useful for resuming training).

  • num_epoch: Total epochs before training stops.

  • reload_epoch:

    • True → Reads the last saved epoch and resumes training.

    • False → Starts fresh.

  • epochs: total number of epochs that the scheduler sees

💡 If using epoch-based schedulers, reload_epoch: True ensures proper continuation.


Learning Rate Scheduling#

use_scheduler: False
scheduler:
  scheduler_type: cosine-annealing-restarts
  first_cycle_steps: 250
  cycle_mult: 6.0
  max_lr: 1.0e-05
  min_lr: 1.0e-08
  warmup_steps: 249
  gamma: 0.7
  • use_scheduler → Enables learning rate scheduling (True or False).

  • Supported scheduler types:

    • cosine-annealing → Reduces LR smoothly over epochs.

    • cosine-annealing-restarts → Periodically resets the LR.

    • step-lr → Reduces LR at fixed intervals.

💡 For long training runs, cosine-annealing-restarts helps escape bad local minima by periodically resetting the LR.


Mixed Precision & Gradient Scaling#

To improve GPU memory efficiency, CREDIT supports mixed precision training:

amp: False
mixed_precision:
    param_dtype: "float32"
    reduce_dtype: "float32"
    buffer_dtype: "float32"
  • amp: True → Enables PyTorch’s Automatic Mixed Precision (AMP).

  • mixed_precision → Fine-grained FSDP precision control:

    • param_dtype: Weight precision (e.g., "float32", "bfloat16").

    • reduce_dtype: Precision for gradients during backprop.

    • buffer_dtype: Buffer storage precision.

💡 For large models, use param_dtype: "bfloat16" to reduce memory usage with minimal accuracy loss.


Gradient Accumulation & Clipping#

grad_accum_every: 1
grad_max_norm: 'dynamic'
  • grad_accum_every:

    • 1 → Normal training.

    • >1 → Accumulates gradients over multiple steps before updating weights (useful for small batch sizes).

  • grad_max_norm:

    • 'dynamic' → Uses adaptive gradient clipping.

    • 0 → No clipping.

💡 Enable gradient accumulation (grad_accum_every > 1) if batch size is constrained by memory but you need a higher effective batch size.


CPU Thread & Prefetch Optimization#

CREDIT allows fine-tuning CPU utilization for better dataloader performance.

thread_workers: 4
valid_thread_workers: 4
prefetch_factor: 4
  • thread_workers: Number of CPU threads for loading training data.

  • valid_thread_workers: Number of CPU threads for validation data.

  • prefetch_factor: Number of samples preloaded into the buffer (works with ERA5_MultiStep_Batcher).

💡 Increase thread_workers for faster data loading, but avoid exceeding available CPU cores.


Summary of Key Training Strategy Recommendations#

Parameter

Recommended Setting

Notes

weight_decay

0 (or 1e-4 for deep models)

Helps prevent overfitting.

train_batch_size

1 (increase if possible)

Larger batch size speeds up training.

batches_per_epoch

1000 (or 0 to use all data)

Reduce for faster debugging.

stopping_patience

50

Stops training if no improvement.

skip_validation

False

Needed for save_best_weights.

reload_epoch

True

Ensures proper resumption of training.

use_scheduler

True (if tuning LR)

Improves long-term stability.

amp

False (enable for mixed precision)

Saves GPU memory.

grad_max_norm

'dynamic'

Prevents gradient explosion.

thread_workers

4

Tune based on available CPUs.


Model Configuration#

CREDIT supports multiple architectures. Example:

type: "crossformer"
frames: 1
image_height: 640
image_width: 1280
levels: 16
channels: 4
surface_channels: 7
  • type: Model architecture (crossformer, fuxi, etc.).

  • frames: Number of input states (historical time steps).

  • image_height, image_width: Spatial resolution (latitude × longitude).

  • levels: Number of atmospheric levels.

  • channels: Number of upper-air variables.

Here’s an expanded and structured section detailing the model configuration, including explanations of architecture choices, spatial resolution, patch embeddings, attention mechanisms, and normalization techniques.

Selecting a Model Architecture#

type: "crossformer"
  • crossformer → Default model based on transformer architecture.

  • fuxi → Alternative model architecture.

  • debugger → Debugging mode (useful for checking data flow).

💡 The choice of architecture affects model scalability and computational efficiency.


Temporal and Spatial Resolution#

frames: 1
image_height: 640
image_width: 1280
levels: 16
  • frames: Number of historical time steps used as input.

  • image_height, image_width: Spatial resolution of the input fields (latitude × longitude).

  • levels: Number of vertical pressure levels for upper-air variables.

💡 For higher resolution datasets, ensure these values match the input data format.


Channel Configuration#

channels: 4
surface_channels: 7
input_only_channels: 3
output_only_channels: 0
  • channels → Number of upper-air input variables.

  • surface_channels → Number of surface input variables.

  • input_only_channels → Channels for dynamic forcing, static features, or external variables.

  • output_only_channels → Reserved for diagnostic variables (default = 0).

💡 If using additional input features (e.g., solar forcing), update input_only_channels.


Patch Embedding (For Transformer-Based Models)#

CREDIT supports patch-based embeddings, where the spatial domain is divided into small patches for transformer processing.

patch_width: 1
patch_height: 1
frame_patch_size: 1
  • patch_width, patch_height → Size of each spatial patch (latitude × longitude).

  • frame_patch_size → Number of time steps per patch (default = 1).

💡 Larger patch sizes can reduce computational cost but may impact fine-scale feature representation.


Transformer Depth and Dimensions#

dim: [32, 64, 128, 256]
depth: [2, 2, 2, 2]
  • dim → Hidden size at each transformer layer.

  • depth → Number of transformer blocks per stage.

💡 Deeper models capture more complex patterns but require more memory.


Attention Mechanism#

CREDIT supports global and local attention mechanisms to efficiently model atmospheric dynamics.

global_window_size: [10, 5, 2, 1]
local_window_size: 10
  • global_window_size → Size of global attention windows at each layer.

  • local_window_size → Size of local attention windows.

💡 Smaller window sizes focus on localized interactions, while larger sizes improve long-range dependencies.


Cross-Embedding (Multi-Scale Feature Extraction)#

cross_embed_kernel_sizes:
  - [4, 8, 16, 32]
  - [2, 4]
  - [2, 4]
  - [2, 4]
cross_embed_strides: [2, 2, 2, 2]
  • cross_embed_kernel_sizes → Defines kernel sizes for hierarchical embeddings.

  • cross_embed_strides → Controls how much spatial downsampling occurs.

💡 Larger kernel sizes extract broader-scale features, while smaller strides preserve fine details.


Regularization & Normalization#

CREDIT includes various techniques to improve training stability and prevent overfitting.

attn_dropout: 0.
ff_dropout: 0.
use_spectral_norm: True
  • attn_dropout → Dropout rate for attention layers (default = 0.0).

  • ff_dropout → Dropout rate for feed-forward layers (default = 0.0).

  • use_spectral_norm → Enables spectral normalization (helps with stability in deep networks).

💡 Increase dropout (0.1 - 0.3) for regularization in larger models.


Interpolation & Output Matching#

interp: True
  • True → Interpolates outputs to match input spatial resolution.

  • False → Outputs raw model predictions.

💡 Set interp: True to ensure predictions align with input grid resolution.


Summary of Key Model Recommendations#

Parameter

Recommended Setting

Notes

type

"crossformer"

Default transformer-based model.

frames

1 (or higher)

More frames improve historical context.

image_height, image_width

640 × 1280 (adjust as needed)

Must match input dataset resolution.

levels

16

Number of vertical pressure levels.

dim

[32, 64, 128, 256]

Controls model capacity.

depth

[2, 2, 2, 2]

Number of layers per stage.

global_window_size

[10, 5, 2, 1]

Attention window size per layer.

attn_dropout

0. (increase if overfitting)

Regularization for attention layers.

use_spectral_norm

True

Stabilizes training.

interp

True

Ensures output matches input grid.


Handling Boundary Effects with Padding#

To improve numerical stability at domain edges, CREDIT supports boundary padding.

padding_conf:
    activate: True
    mode: earth
    pad_lat: 80
    pad_lon: 80
  • activate: True → Enables padding at spatial domain edges.

  • mode: 'earth' → Specifies Earth-system-aware padding (useful for atmospheric models), which is described in Schreck et al. 2025

  • pad_lat → Extends padding by 80 latitude points.

  • pad_lon → Extends padding by 80 longitude points.

💡 Padding ensures continuity at boundaries, preventing artifacts in global simulations.


Summary of Key Padding Recommendations#

Parameter

Recommended Setting

Notes

padding_conf.activate

True

Enables domain padding.

padding_conf.mode

'earth'

Uses Earth-system-specific padding.

padding_conf.pad_lat

80

Adjust based on dataset resolution.

padding_conf.pad_lon

80

Ensures global continuity.

Here is a vastly expanded and fully structured explanation of the post-processing (post_conf) section in CREDIT. This covers conservation schemes, tracer corrections, and energy/mass balance adjustments in depth.


Post-Block (post_conf)#

The post-processing block (post_conf) enforces physical conservation constraints on model outputs, correcting imbalances in mass, water, energy, and tracers.

Activating Post-Processing#

post_conf:
    activate: True
  • True → Enables post-processing corrections.

  • False → Disables post-processing (not recommended for production runs).

💡 Always enable post_conf for physically consistent forecasts.


Stochastic Kinetic Energy Backscatter (SKEBS)#

SKEBS introduces stochastic perturbations to correct underdispersed forecasts in weather models.

Based on Berner, J., Shutts, G. J., Leutbecher, M., & Palmer, T. N. (2009). A spectral stochastic kinetic energy backscatter scheme and its impact on flow-dependent predictability in the ECMWF ensemble prediction system. Journal of the Atmospheric Sciences, 66(3), 603-626.

  • True → Enables kinetic energy backscatter corrections (experimental).

  • False → Disables SKEBS.

💡 Enable if testing ensemble perturbations for uncertainty quantification.

skebs:
    activate: True
    freeze_base_model_weights: True  # turn off training of the basemodel

    # skebs module training options
    trainable: True # is skebs trainable at all
    freeze_dissipation_weights: False  # turn off training for dissipation
    freeze_pattern_weights: True  # turn off training for the spectral pattern
    lmax: None # lmax, mmax for spectral transforms
    mmax: None

    # custom initialization of alpha
    alpha_init: 0.95 
    train_alpha: False #trains alpha no matter what

    # dissipation config:
    zero_out_levels_top_of_model: 3 # zero out backscatter at top k levels of the model

    dissipation_scaling_coefficient: 10.
    dissipation_type: FCNN 
    # available types:
    #    - prescribed: fixed dissipation rate spatially, varies by level starts at sigma_max level (see below)
    #    - uniform: fixed dissipation rate spatially, varies by level starts at 2.5
    #    - FCNN: two layer small MLP
    #    - FCNN_wide: four layer wide MLP
    #    - unet: user specified arch, default: unet++
    #    - CNN: single 3x3 convolution with padding for each column

    # unet - see models/unet.py for examples
    # architecture:
    padding: 48

    # prescribed dissipation:
    sigma_max: 2.0 # what sigma level to set as the max wind. perturbation will be roughly sigma_max * std for wind at each level

    # spectral filters, will anneal to 0 from anneal_start (linspace)
    max_pattern_wavenum: 60
    pattern_filter_anneal_start: 40

    max_backscatter_wavenum: 100
    backscatter_filter_anneal_start: 90

    # [Optional] default is off
    train_backscatter_filter: False
    train_pattern_filter: False

    # data config - does the backscatter model get statics variables?
    use_statics: False 

    # [Optional] early skebs shutoff on iteration number:
    iteration_stop: 0 # if 0, skebs is always run

    #### debugging ####
    # write files during training:
    write_train_debug_files: False #writing out files while training, if this is False          
    write_train_every: 999

    # write files during inference
    write_rollout_debug_files: False # saves only when no_grad 

Conservation Schemes#

CREDIT enforces physical conservation laws for:

  1. Water Conservation (tracers, precipitation, evaporation).

  2. Mass Conservation (fixes inconsistencies in pressure/height fields).

  3. Energy Conservation (balances fluxes and temperature).


General Settings for Conservation Fixers#

Each conservation scheme follows these shared settings:

# Applies the correction method
activate: True  

# Converts from normalized values back to real units before applying fixes
denorm: True  

# Runs the correction outside the model (useful for multi-step training)
activate_outside_model: False  

# Specifies the grid type:
#   "pressure" = constant pressure levels
#   "sigma" = hybrid sigma-pressure levels
grid_type: "sigma"

# Required grid variables (latitude, longitude, vertical levels)
lon_lat_level_name: ["lon2d", "lat2d", "coef_a", "coef_b"]

# Specifies whether levels represent layer edges (midpoint=True) or centers (midpoint=False)
midpoint: True  

💡 For sigma-coordinate models, ensure the physics file includes coef_a and coef_b. These are the sigma pressure level files in units Pa and Fraction, respectively


Tracer Fixer: Ensuring Non-Negative Water Content#

This correction ensures no negative values for total water content and precipitation.

tracer_fixer:
    activate: True
    denorm: True
    tracer_name: ["specific_total_water", "total_precipitation"]
    tracer_thres: [0, 0]
  • tracer_name → List of variables to fix (e.g., specific humidity, precipitation).

  • tracer_thres → Threshold values (e.g., 0 means no negative values allowed).

💡 Negative values can appear due to numerical instability—this ensures physically meaningful water content.


Global Mass Fixer#

This correction ensures total mass is conserved across all vertical levels.

global_mass_fixer:
    activate: True
    activate_outside_model: False
    simple_demo: False
    denorm: True
    grid_type: "sigma"
    midpoint: True
    fix_level_num: 7
    lon_lat_level_name: ["lon2d", "lat2d", "coef_a", "coef_b"]
    surface_pressure_name: ["SP"]
    specific_total_water_name: ["specific_total_water"]
  • fix_level_num: 7 → Ensures conservation only up to the 7th level (avoids modifying upper layers).

  • surface_pressure_name → Name of the surface pressure variable (used for pressure-mass balancing).

  • specific_total_water_name → Name of the specific humidity variable.

💡 Use this to prevent mass drift in long-term climate simulations.


Global Water Fixer#

This correction ensures global water conservation by adjusting precipitation and evaporation terms.

global_water_fixer:
    activate: True
    activate_outside_model: False
    simple_demo: False
    denorm: True
    grid_type: "sigma"
    midpoint: True
    lon_lat_level_name: ["lon2d", "lat2d", "coef_a", "coef_b"]
    surface_pressure_name: ["SP"]
    specific_total_water_name: ["specific_total_water"]
    precipitation_name: ["total_precipitation"]
    evaporation_name: ["evaporation"]
  • precipitation_name → Variable name for total precipitation.

  • evaporation_name → Variable name for evaporation flux.

💡 Prevents artificial drift in atmospheric moisture by correcting evaporation/precipitation imbalances.


Global Energy Fixer#

This correction ensures total energy conservation by adjusting heat fluxes, radiation, and wind kinetic energy.

global_energy_fixer:
    activate: True
    activate_outside_model: False
    simple_demo: False
    denorm: True
    grid_type: "sigma"
    midpoint: True
    lon_lat_level_name: ["lon2d", "lat2d", "coef_a", "coef_b"]
    surface_pressure_name: ["SP"]
    air_temperature_name: ["temperature"]
    specific_total_water_name: ["specific_total_water"]
    u_wind_name: ["u_component_of_wind"]
    v_wind_name: ["v_component_of_wind"]
    surface_geopotential_name: ["geopotential_at_surface"]
    TOA_net_radiation_flux_name: ["top_net_solar_radiation", "top_net_thermal_radiation"]
    surface_net_radiation_flux_name: ["surface_net_solar_radiation", "surface_net_thermal_radiation"]
    surface_energy_flux_name: ["surface_sensible_heat_flux", "surface_latent_heat_flux"]

Key Adjustments#

Variable

Purpose

air_temperature_name

Balances total heat content.

specific_total_water_name

Adjusts for latent heat effects.

u_wind_name, v_wind_name

Ensures kinetic energy conservation.

surface_geopotential_name

Ensures consistency with potential energy.

TOA_net_radiation_flux_name

Accounts for top-of-atmosphere radiation balance.

surface_net_radiation_flux_name

Balances incoming and outgoing radiation.

surface_energy_flux_name

Adjusts for surface energy exchanges.

💡 Use this to prevent temperature drift and ensure radiative balance in climate models.


Summary of Key Conservation Fixers#

Fixer

Purpose

Key Variables

Tracer Fixer

Prevents negative water values

"specific_total_water", "total_precipitation"

Mass Fixer

Ensures total air mass conservation

"SP", "specific_total_water"

Water Fixer

Balances precipitation and evaporation

"SP", "total_precipitation", "evaporation"

Energy Fixer

Maintains energy balance (radiation, heat, wind)

"temperature", "surface_net_radiation_flux_name"


Best Practices#

Always enable post_conf for physically consistent model outputs.
Ensure save_loc_physics contains required grid variables (lon2d, lat2d, coef_a, coef_b).
Adjust fix_level_num if conservation should only apply to certain layers.
Test with simple_demo: True first to visualize corrections before full training.

Loss Configuration#

The loss section defines how CREDIT computes training loss, including options for custom loss functions, spectral constraints, and latitude-based weighting.


Selecting the Training Loss Function#

training_loss: "mse"
  • Available loss functions:

    • "mse"Mean Squared Error (default; penalizes large errors).

    • "mae"Mean Absolute Error (more robust to outliers).

    • "huber"Huber Loss (combination of MSE and MAE).

    • "logcosh"Log-Cosh Loss (similar to Huber, smooths large errors).

    • "xtanh" → Custom loss using hyperbolic tangent.

    • "xsigmoid" → Custom loss using sigmoid transformation.

    • "KCRPS" → bias corrected CRPS for ensemble training.

    • "almost-fair-crps"→ bias-corrected CRPS for ensemble training with small ensembles.

💡 mse is recommended for smooth loss surfaces, while huber or logcosh are better for handling outliers.


Power & Spectral Loss#

CREDIT supports spectral and power-based losses to penalize errors in the frequency domain.

use_power_loss: False
use_spectral_loss: False
spectral_lambda_reg: 0.1
spectral_wavenum_init: 20
  • use_power_loss → Enables power spectrum loss (recommended for climate models).

  • use_spectral_loss → Enables spectral loss (alternative to power loss).

  • spectral_lambda_reg → Weighting factor for spectral loss (0.1 = mild effect).

  • spectral_wavenum_initTruncates low-wavenumber components, ensuring loss focuses on fine-scale structures.

💡 Enable only one of use_power_loss or use_spectral_loss—they should not be used together.


Latitude-Based Loss Weighting#

Since Earth’s surface area varies with latitude, CREDIT supports weighting loss by latitude.

latitude_weights: "/path/to/latitude_weights.nc"
use_latitude_weights: True
  • latitude_weights → NetCDF file containing cos(latitude) as a variable (coslat).

  • use_latitude_weights: True → Enables latitude-based weighting to prevent polar regions from dominating training loss.

💡 This is strongly recommended for global models to ensure loss scaling matches physical area coverage.


Variable-Specific Loss Weighting#

CREDIT allows custom loss weighting per variable, ensuring critical variables are penalized more heavily.

use_variable_weights: False
  • True → Enables custom per-variable loss weighting.

  • False → All variables contribute equally to the loss function.

Example: Custom Variable Weights#

variable_weights:
    U: [0.132, 0.123, 0.113, 0.104, 0.095, 0.085, 0.076, 0.067, 0.057, 0.048, 0.039, 0.029, 0.02, 0.011, 0.005]
    V: [0.132, 0.123, 0.113, 0.104, 0.095, 0.085, 0.076, 0.067, 0.057, 0.048, 0.039, 0.029, 0.02, 0.011, 0.005]
    T: [0.132, 0.123, 0.113, 0.104, 0.095, 0.085, 0.076, 0.067, 0.057, 0.048, 0.039, 0.029, 0.02, 0.011, 0.005]
    Q: [0.132, 0.123, 0.113, 0.104, 0.095, 0.085, 0.076, 0.067, 0.057, 0.048, 0.039, 0.029, 0.02, 0.011, 0.005]
    SP: 0.1
    t2m: 1.0
    V500: 0.1
    U500: 0.1
    T500: 0.1
    Z500: 0.1
    Q500: 0.1
  • Upper-air variables (U, V, T, Q): Different weights per level.

  • Surface variables (SP, t2m, etc.): Single weight per variable.

💡 Increase weighting for critical variables (e.g., T500, Z500) to improve accuracy in key forecast fields.


Summary of Key Loss Recommendations#

Parameter

Recommended Setting

Notes

training_loss

"mse" (default)

Use "huber" or "logcosh" if data contains outliers.

use_power_loss

False

Set True to penalize spectral errors.

use_spectral_loss

False

Do not enable both spectral and power loss.

spectral_lambda_reg

0.1

Adjust to control spectral penalty strength.

use_latitude_weights

True

Recommended for global datasets.

use_variable_weights

False

Enable if some variables are more important.

Prediction (Inference) Configuration#

The predict section controls how CREDIT runs forecasts after training, including:

  • Batching and parallel execution

  • Forecast initialization settings

  • Storage format for predicted fields

  • Post-processing options (e.g., low-pass filtering, anomaly computation)


GPU Usage for Inference#

CREDIT supports single-GPU and distributed inference.

mode: none  # Options: "none", "fsdp", "ddp"
  • none → Runs inference on a single GPU.

  • fsdp → Fully Sharded Data Parallel (recommended for multi-GPU).

  • ddp → Distributed Data Parallel (alternative for multi-GPU).

💡 Use fsdp for large models to optimize memory usage during inference.


Batch Size & Ensemble Forecasting#

batch_size: 1
ensemble_size: 1
  • batch_size → Number of forecast initializations processed at once.

  • ensemble_size → Number of ensemble members per initialization.

💡 Increase batch_size if running inference on multiple GPUs.


Forecast Initialization Settings#

CREDIT can initialize forecasts at specific times and run for a set duration.

forecasts:
    type: "custom"
    start_year: 2019
    start_month: 1
    start_day: 1
    start_hours: [0, 12]
    duration: 1152
    days: 10
  • type"custom" (default; allows user-defined start dates).

  • start_year, start_month, start_day → Defines the first forecast initialization.

  • start_hours → List of times per day for initializing forecasts (e.g., 0 for 00Z, 12 for 12Z).

  • duration → Total number of days to initialize forecasts.

    • Should be divisible by the number of GPUs for parallel execution.

  • days → Forecast lead time in days (e.g., 10 = 10-day forecast).

💡 For year-long forecasts, set duration: 365 and start_hours: [0] (daily initialization).


Output Storage & File Naming#

save_forecast: '/path/to/forecast_output/'
  • Defines where forecast outputs are stored.

  • Each initialization creates a separate subdirectory inside save_forecast/.

  • Output files are saved in NetCDF format (.nc).

💡 Ensure the path has enough storage capacity for long-duration forecasts!


Selecting Output Variables#

metadata: '/path/to/metadata/era5.yaml'
  • CREDIT automatically selects which variables to save based on this metadata file.

  • To save all variables, remove save_vars from configuration.yml.

💡 Modify metadata.yaml if custom variables need to be included/excluded.


Low-Pass Filtering for Smoother Predictions#

use_laplace_filter: False
  • True → Applies a low-pass filter to reduce high-frequency noise.

  • False → Saves raw model outputs without filtering.

💡 Enable use_laplace_filter: True if forecasts contain unrealistic high-frequency oscillations.


Climatology File for Anomaly Computation#

CREDIT can compute anomaly correlations using a reference climatology.

climatology: '/path/to/climatology.nc'
  • If provided, rollout_metrics.py will compute Anomaly ACC (Anomaly Correlation Coefficient).

  • If missing, Pearson correlation is used instead.

💡 Use a 30-year climatology (e.g., ERA5 1990-2019) for best results.


Summary of Key Prediction Recommendations#

Parameter

Recommended Setting

Notes

mode

"fsdp" (for multi-GPU)

"none" for single-GPU inference.

batch_size

1 (increase for parallelism)

Processes multiple initializations at once.

ensemble_size

1 (or higher for ensembles)

Supports probabilistic forecasting.

forecasts.start_hours

[0, 12]

Runs forecasts twice daily.

forecasts.duration

365 (for annual forecasting)

Should be divisible by the number of GPUs.

use_laplace_filter

False

Enable if forecasts contain high-frequency noise.

climatology

ERA5 1990-2019

Improves anomaly-based evaluation.

PBS Job Submission (HPC)#

For running CREDIT on NCAR HPC systems (Derecho, Casper):

pbs: 
    conda: "credit-derecho"
    project: "NAML0001"
    job_name: "train_model"
    walltime: "12:00:00"
    nodes: 8
    ncpus: 64
    ngpus: 4
  • nodes, ncpus, ngpus: Adjust based on compute resources.

  • For Casper: Change queue: 'casper' and specify gpu_type: 'v100'.


Troubleshooting#

Issue

Possible Cause

Solution

Training loss does not decrease

Learning rate too high/low

Adjust learning_rate or use a scheduler

Model runs out of memory

Batch size too large

Reduce train_batch_size or enable mixed precision

Output fields look unrealistic

Conservation schemes disabled

Ensure post_conf.activate: True

Forecasts diverge quickly

Model lacks historical context

Increase frames in model configuration

Data loading errors

Incorrect file format or missing variables

Ensure .nc or .zarr format and check save_loc_* paths


Best Practices#

  • Check Data Formats: Ensure variables follow expected dimensions (time, level, lat, lon).

  • Use a Seed for Reproducibility: Keep seed fixed unless testing variations.

  • Enable Conservation Schemes: To maintain physical consistency.

  • Run Small Tests First: Before launching full-scale HPC jobs, test with fewer epochs (num_epoch: 5).


Additional Resources#

NCAR HPC Guide


This guide is a living document—please report issues or suggest improvements! 🚀