credit.datasets.era5
====================

.. py:module:: credit.datasets.era5


Attributes
----------

.. autoapisummary::

   credit.datasets.era5.logger
   credit.datasets.era5.VALID_FIELD_TYPES


Classes
-------

.. autoapisummary::

   credit.datasets.era5.ERA5Dataset


Module Contents
---------------

.. py:data:: logger

.. py:data:: VALID_FIELD_TYPES

.. py:class:: ERA5Dataset(config, return_target=False)

   Bases: :py:obj:`torch.utils.data.Dataset`


   Pytorch Dataset for processed ERA5 data. Relies on a configuration dictionary to define:
       1) 2D / 3D variables
       2) Start, End and Frequency of Datetimes
       3) path to glob for the data

   Example YAML configuration
   --------------------------
   .. code-block:: yaml

       data:
         source:
           ERA5:
             prognostic:
               vars_3D: ['T', 'U', 'V', 'Q']
               vars_2D: ['T500', 'U500', 'V500', 'Q500' ,'Z500', 'tsi', 't2m','SP']
               path: "<path to prognostic>"
             diagnostic:
               vars_3D: ['T', 'U', 'V', 'Q']
               vars_2D: ['T500', 'U500', 'V500', 'Q500' ,'Z500', 'tsi', 't2m','SP']
               path: "<path to diagnostic>"
             static:
               vars_3D: ['T', 'U', 'V', 'Q']
               vars_2D: ['T500', 'U500', 'V500', 'Q500' ,'Z500', 'tsi', 't2m','SP']
               path: "<path to static>"
             dynamic_forcing:
               vars_3D: ['T', 'U', 'V', 'Q']
               vars_2D: ['T500', 'U500', 'V500', 'Q500' ,'Z500', 'tsi', 't2m','SP']
               path: "<path to dynamic forcing>"

       start_datetime: "2017-01-01"
       end_datetime: "2019-12-31"
       timestep: "6h"

   Assumptions:
       1) The data MUST be stored in yearly zarr or netCDF files with a unique 4-digit year (YYYY) in the file name
       2) "time" dimension / coordinate is present
       3) "level" dimension name representing the vertical level
       4) Dimension order of ('time', level', 'latitude', 'longitude') for 3D vars (remove level for 2D)
       5) Data should be chunked efficiently for a fast read (recommend small chunks across time dimension).


   .. py:attribute:: source_name
      :value: 'ERA5'



   .. py:attribute:: return_target
      :value: False



   .. py:attribute:: dt


   .. py:attribute:: num_forecast_steps


   .. py:attribute:: start_datetime


   .. py:attribute:: end_datetime


   .. py:attribute:: datetimes


   .. py:attribute:: years


   .. py:attribute:: file_dict


   .. py:attribute:: var_dict


   .. py:method:: _timestamps()

      return total time steps



   .. py:method:: __len__()


   .. py:method:: _map_files(file_list)

      Create a dictionary to lookup the file for a timestep

      :param file_list: List of file paths
      :type file_list: list



   .. py:method:: __getitem__(args)

      Returns a sample of data.

      :param args: Input_time step from sampler, step index from sampler
      :type args: tuple



   .. py:method:: _open_ds_extract_fields(field_type, t, return_data, is_target=False)

      opens the dataset, reshapes and concats the variables into n np array,
      packs it into the return dict if the data exists.

      :param field_type: Field type ("prognostic", "diagnostic", etc)
      :type field_type: str
      :param t: Current timestamp
      :type t: pd.Timestamp
      :param return_data: Dictionary of data to return
      :type return_data: dict
      :param is_target: Flag for if data is x or y data
      :type is_target: bool



   .. py:method:: _reshape_and_concat(ds_3D, ds_2D)

      Stack 3D variables along level and variable, concatenate with 2D variables, and reorder dimensions.

      :param ds_3D: Xarray dataset with 3D spatial variables
      :type ds_3D: xr.Dataset
      :param ds_2D: Xarray dataset with 2D spatial variables
      :type ds_2D: xr.Dataset



   .. py:method:: _add_metadata(return_data, t, t_target=None)

      Update metadata dictionary

      :param return_data: Return dictionary
      :type return_data: dict
      :param t: Time step
      :type t: int
      :param t_target: Target time step or None



   .. py:method:: _convert_cf_time(ts)

      Convert pandas timestamp to cftime

      :param ts: pandas timestamp



   .. py:method:: _pop_and_merge_targets(return_data, dim=0)

      Look for target diagnostic and prognostic variables. If both exist, concatenate them along specified dimension.

      :param return_data: Dictionary of current data to return
      :param dim: Concat dimension



