credit.datasets.era5#
Attributes#
Classes#
Pytorch Dataset for processed ERA5 data. Relies on a configuration dictionary to define: |
Module Contents#
- credit.datasets.era5.logger#
- credit.datasets.era5.VALID_FIELD_TYPES#
- class credit.datasets.era5.ERA5Dataset(config, return_target=False)#
Bases:
torch.utils.data.Dataset- Pytorch Dataset for processed ERA5 data. Relies on a configuration dictionary to define:
2D / 3D variables
Start, End and Frequency of Datetimes
path to glob for the data
Example YAML configuration#
data: source: ERA5: prognostic: vars_3D: ['T', 'U', 'V', 'Q'] vars_2D: ['T500', 'U500', 'V500', 'Q500' ,'Z500', 'tsi', 't2m','SP'] path: "<path to prognostic>" diagnostic: vars_3D: ['T', 'U', 'V', 'Q'] vars_2D: ['T500', 'U500', 'V500', 'Q500' ,'Z500', 'tsi', 't2m','SP'] path: "<path to diagnostic>" static: vars_3D: ['T', 'U', 'V', 'Q'] vars_2D: ['T500', 'U500', 'V500', 'Q500' ,'Z500', 'tsi', 't2m','SP'] path: "<path to static>" dynamic_forcing: vars_3D: ['T', 'U', 'V', 'Q'] vars_2D: ['T500', 'U500', 'V500', 'Q500' ,'Z500', 'tsi', 't2m','SP'] path: "<path to dynamic forcing>" start_datetime: "2017-01-01" end_datetime: "2019-12-31" timestep: "6h"
- Assumptions:
The data MUST be stored in yearly zarr or netCDF files with a unique 4-digit year (YYYY) in the file name
“time” dimension / coordinate is present
“level” dimension name representing the vertical level
Dimension order of (‘time’, level’, ‘latitude’, ‘longitude’) for 3D vars (remove level for 2D)
Data should be chunked efficiently for a fast read (recommend small chunks across time dimension).
- source_name = 'ERA5'#
- return_target = False#
- dt#
- num_forecast_steps#
- start_datetime#
- end_datetime#
- datetimes#
- years#
- file_dict#
- var_dict#
- _timestamps()#
return total time steps
- __len__()#
- _map_files(file_list)#
Create a dictionary to lookup the file for a timestep
- Parameters:
file_list (list) – List of file paths
- __getitem__(args)#
Returns a sample of data.
- Parameters:
args (tuple) – Input_time step from sampler, step index from sampler
- _open_ds_extract_fields(field_type, t, return_data, is_target=False)#
opens the dataset, reshapes and concats the variables into n np array, packs it into the return dict if the data exists.
- Parameters:
field_type (str) – Field type (“prognostic”, “diagnostic”, etc)
t (pd.Timestamp) – Current timestamp
return_data (dict) – Dictionary of data to return
is_target (bool) – Flag for if data is x or y data
- _reshape_and_concat(ds_3D, ds_2D)#
Stack 3D variables along level and variable, concatenate with 2D variables, and reorder dimensions.
- Parameters:
ds_3D (xr.Dataset) – Xarray dataset with 3D spatial variables
ds_2D (xr.Dataset) – Xarray dataset with 2D spatial variables
- _add_metadata(return_data, t, t_target=None)#
Update metadata dictionary
- Parameters:
return_data (dict) – Return dictionary
t (int) – Time step
t_target – Target time step or None
- _convert_cf_time(ts)#
Convert pandas timestamp to cftime
- Parameters:
ts – pandas timestamp
- _pop_and_merge_targets(return_data, dim=0)#
Look for target diagnostic and prognostic variables. If both exist, concatenate them along specified dimension.
- Parameters:
return_data – Dictionary of current data to return
dim – Concat dimension