credit.parser
=============

.. py:module:: credit.parser

.. autoapi-nested-parse::

   parser.py
   -------------------------------------------------------
   Content:
       - credit_main_parser
       - training_data_check
       - predict_data_check
       - remove_string_by_pattern


Functions
---------

.. autoapisummary::

   credit.parser.validate_args
   credit.parser.replace_nested_key
   credit.parser.remove_string_by_pattern
   credit.parser.credit_main_parser
   credit.parser.training_data_check
   credit.parser.predict_data_check


Module Contents
---------------

.. py:function:: validate_args(function, argdict, context, ignore=[])

   For calling 'function(**argdict)'.  Checks that all arguments
   required by function exist in argdict and throws an error if they
   don't.  Checks that arguments in argdict appear in the sigature of
   function and deletes any that don't (with a warning).  'context'
   is a string added to the warning/error messages to make them more
   informative. 'ignore' is a list of parameters to leave alone even if
   they don't appear in the signature.


.. py:function:: replace_nested_key(data, key, value)

   Recursively searches a nested dictionary and sets each instance
   of `key` to `value`.  Behavior may be unpredictable if the
   original value is also a dict.


.. py:function:: remove_string_by_pattern(list_string, pattern)

   Given a list of strings, remove some of them based on a given pattern.
   Usage: remove 'time'/'datetime'/'lead_time' coordinates from a list of all coordinate names.


.. py:function:: credit_main_parser(conf, parse_training=True, parse_predict=True, print_summary=False)

   Parses and validates the configuration input for the CREDIT project.

   This function examines the provided configuration dictionary (`conf`), ensures that all required fields are
   present, and assigns default values where necessary. It is designed to be used in various training and
   prediction modules within the CREDIT repository. Missing critical fields will trigger assertion errors, while
   others will receive default values. A standardized version of the input configuration will be returned, ensuring
   consistency across different applications.

   :param conf: Configuration dictionary containing all settings for data, model, trainer, and prediction phases.
   :type conf: dict
   :param parse_training: If True, the function will check for training-specific fields. Defaults to True.
   :type parse_training: bool, optional
   :param parse_predict: If True, the function will check for prediction-specific fields. Defaults to True.
   :type parse_predict: bool, optional
   :param print_summary: If True, a summary of the parsed variables will be printed. Defaults to False.
   :type print_summary: bool, optional

   :returns: The standardized and validated configuration dictionary.
   :rtype: dict

   :raises AssertionError: If any critical fields are missing or invalid in the provided configuration.

   .. rubric:: Notes

   This function is used in the following scripts:
   - applications/train_gen1.py
   - applications/train_multistep.py
   - applications/rollout_to_netcdf.py


.. py:function:: training_data_check(conf, print_summary=False)

   Note: this function is designed for model training, NOT for rollout

   The following items are covered:
       - All yearly files (upper-air, surface, dynamic forcing, diagnostic)
         can support conf['data']['train_years'], conf['data']['valid_years']
       - All variables (upper-air, surface, dynamic forcing, diagnostic)
         do exist in their corresponding files
         Note: only one file of each group will be checked.
       - All files (upper-air, surface, dynamic forcing, diagnostic, forcing, static, mean, std, lat_weights)
         have the same coordinate names and coordinate values
         Note: this part checks lat, lon, level coordinates, and it ignores 'time' coordinates.

   Where is it applied?
       - applications/train_gen1.py
       - applications/train_multistep.py


.. py:function:: predict_data_check(conf, print_summary=False)

   Note: this function is designed for model rollout.
         Diagnostic variables are checked in mean and std files only

   The following items are covered:
       - All variables (upper-air, surface, dynamic forcing)
         do exist in their corresponding files
         Note: only one file of each group will be checked.
       - All files (upper-air, surface, dynamic forcing, forcing, static, mean, std, lat_weights)
         have the same coordinate names and coordinate values
         Note: this part checks lat, lon, level coordinates, and it ignores 'time' coordinates.

   Where is it applied?
       - applications/rollout_to_netcdf_new.py