credit.datasets._utils
======================

.. py:module:: credit.datasets._utils

.. autoapi-nested-parse::

   _file_utils.py
   --------------
   Shared file-mapping helpers for ERA5 and MRMS dataset classes.

   Provides strftime-based filename parsing and binary-search timestamp-to-file
   lookup, supporting any temporal file granularity (annual, monthly, daily, etc.).



Attributes
----------

.. autoapisummary::

   credit.datasets._utils._STRFTIME_TO_REGEX
   credit.datasets._utils._STRFTIME_TO_FREQ


Functions
---------

.. autoapisummary::

   credit.datasets._utils._strftime_to_regex
   credit.datasets._utils._infer_period_freq
   credit.datasets._utils._map_files
   credit.datasets._utils._find_file
   credit.datasets._utils._to_cftime
   credit.datasets._utils._start_s3_fs


Module Contents
---------------

.. py:data:: _STRFTIME_TO_REGEX
   :type:  dict[str, str]

.. py:data:: _STRFTIME_TO_FREQ
   :type:  list[tuple[str, str]]
   :value: [('%S', 's'), ('%M', 'min'), ('%H', 'h'), ('%j', 'D'), ('%d', 'D'), ('%m', 'M')]


.. py:function:: _strftime_to_regex(fmt: str) -> re.Pattern

   Convert a strftime format string to a compiled regex.

   The returned pattern matches the date substring in a filename; use
   ``m.group(0)`` together with the original *fmt* and ``strptime`` to
   recover the datetime.

   :param fmt: strftime format string (e.g. ``"%Y"``, ``"%Y%m%d-%H%M%S"``).

   :returns: Compiled regex pattern matching the date portion of a filename.


.. py:function:: _infer_period_freq(fmt: str) -> str

   Return the finest ``pd.Period`` frequency implied by a strftime format.

   :param fmt: strftime format string.

   :returns: pd.Period frequency string (e.g. ``"h"``, ``"D"``, ``"M"``, ``"Y"``).


.. py:function:: _map_files(file_list: list[str], time_fmt: str) -> list[tuple[pandas.Timestamp, pandas.Timestamp, str]]

   Build a sorted list of ``(start, end, path)`` intervals.

   For a single file the interval covers all representable time so no
   date parsing is attempted. For multiple files, *time_fmt* (a strftime
   format string) is used to extract the date from each filename's
   basename; ``pd.Period`` then determines the exact coverage window.

   :param file_list: Sorted list of file paths returned by glob.
   :param time_fmt: strftime format string, e.g. ``"%Y"``, ``"%Y%m%d-%H%M%S"``.

   :returns: List of ``(start, end, path)`` tuples sorted by start time.

   :raises ValueError: If *time_fmt* does not match the basename of any file
       in *file_list*.


.. py:function:: _find_file(intervals: list[tuple[pandas.Timestamp, pandas.Timestamp, str]], t: pandas.Timestamp) -> str

   Binary-search for the file whose interval covers *t*.

   :param intervals: Sorted list of ``(start, end, path)`` tuples.
   :param t: Timestamp to look up.

   :returns: Path to the file covering *t*.

   :raises KeyError: If no interval covers *t*.


.. py:function:: _to_cftime(ts: pandas.Timestamp, calendar: str) -> cftime.datetime

   Convert a pandas Timestamp to a cftime.datetime.

   :param ts: Pandas Timestamp to convert.
   :param calendar: cftime calendar string read from the dataset
                    (e.g. ``"noleap"``, ``"gregorian"``, ``"proleptic_gregorian"``).

   :returns: cftime.datetime with the specified calendar.


.. py:function:: _start_s3_fs() -> s3fs.S3FileSystem

   Lazily initialize an anonymous ``s3fs.S3FileSystem`` instance.

   Called automatically on the first ``__extract_field__`` (called within ``__getitem__``)
   invocation when ``mode`` is ``"remote"``. The filesystem object is cached in ``_fs``
   for re-use across later calls.



