credit.datasets.era5

credit.datasets.era5#

Attributes#

`logger`
`VALID_FIELD_TYPES`

Classes#

ERA5Dataset

Pytorch Dataset for processed ERA5 data. Relies on a configuration dictionary to define:

Module Contents#

credit.datasets.era5.logger#

credit.datasets.era5.VALID_FIELD_TYPES#

class credit.datasets.era5.ERA5Dataset(config, return_target=False)#

Bases: torch.utils.data.Dataset

Pytorch Dataset for processed ERA5 data. Relies on a configuration dictionary to define:

2D / 3D variables
Start, End and Frequency of Datetimes
path to glob for the data

Example YAML configuration#

data:
  source:
    ERA5:
      prognostic:
        vars_3D: ['T', 'U', 'V', 'Q']
        vars_2D: ['T500', 'U500', 'V500', 'Q500' ,'Z500', 'tsi', 't2m','SP']
        path: "<path to prognostic>"
      diagnostic:
        vars_3D: ['T', 'U', 'V', 'Q']
        vars_2D: ['T500', 'U500', 'V500', 'Q500' ,'Z500', 'tsi', 't2m','SP']
        path: "<path to diagnostic>"
      static:
        vars_3D: ['T', 'U', 'V', 'Q']
        vars_2D: ['T500', 'U500', 'V500', 'Q500' ,'Z500', 'tsi', 't2m','SP']
        path: "<path to static>"
      dynamic_forcing:
        vars_3D: ['T', 'U', 'V', 'Q']
        vars_2D: ['T500', 'U500', 'V500', 'Q500' ,'Z500', 'tsi', 't2m','SP']
        path: "<path to dynamic forcing>"

start_datetime: "2017-01-01"
end_datetime: "2019-12-31"
timestep: "6h"

Assumptions:

The data MUST be stored in yearly zarr or netCDF files with a unique 4-digit year (YYYY) in the file name
“time” dimension / coordinate is present
“level” dimension name representing the vertical level
Dimension order of (‘time’, level’, ‘latitude’, ‘longitude’) for 3D vars (remove level for 2D)
Data should be chunked efficiently for a fast read (recommend small chunks across time dimension).

source_name = 'ERA5'#

return_target = False#

dt#

num_forecast_steps#

start_datetime#

end_datetime#

datetimes#

years#

file_dict#

var_dict#

_timestamps()#: return total time steps

__len__()#

_map_files(file_list)#

Create a dictionary to lookup the file for a timestep

Parameters:: file_list (list) – List of file paths

__getitem__(args)#

Returns a sample of data.

Parameters:: args (tuple) – Input_time step from sampler, step index from sampler

_open_ds_extract_fields(field_type, t, return_data, is_target=False)#

opens the dataset, reshapes and concats the variables into n np array, packs it into the return dict if the data exists.

Parameters:

field_type (str) – Field type (“prognostic”, “diagnostic”, etc)
t (pd.Timestamp) – Current timestamp
return_data (dict) – Dictionary of data to return
is_target (bool) – Flag for if data is x or y data

_reshape_and_concat(ds_3D, ds_2D)#

Stack 3D variables along level and variable, concatenate with 2D variables, and reorder dimensions.

Parameters:

ds_3D (xr.Dataset) – Xarray dataset with 3D spatial variables
ds_2D (xr.Dataset) – Xarray dataset with 2D spatial variables

_add_metadata(return_data, t, t_target=None)#

Update metadata dictionary

Parameters:

return_data (dict) – Return dictionary
t (int) – Time step
t_target – Target time step or None

_convert_cf_time(ts)#

Convert pandas timestamp to cftime

Parameters:: ts – pandas timestamp

_pop_and_merge_targets(return_data, dim=0)#

Look for target diagnostic and prognostic variables. If both exist, concatenate them along specified dimension.

Parameters:

return_data – Dictionary of current data to return
dim – Concat dimension