credit.datasets._utils#

Shared file-mapping helpers for ERA5 and MRMS dataset classes.

Provides strftime-based filename parsing and binary-search timestamp-to-file lookup, supporting any temporal file granularity (annual, monthly, daily, etc.).

Attributes#

Functions#

_strftime_to_regex(→ re.Pattern)

Convert a strftime format string to a compiled regex.

_infer_period_freq(→ str)

Return the finest pd.Period frequency implied by a strftime format.

_map_files(→ list[tuple[pandas.Timestamp, ...)

Build a sorted list of (start, end, path) intervals.

_find_file(→ str)

Binary-search for the file whose interval covers t.

_to_cftime(→ cftime.datetime)

Convert a pandas Timestamp to a cftime.datetime.

_start_s3_fs(→ s3fs.S3FileSystem)

Lazily initialize an anonymous s3fs.S3FileSystem instance.

Module Contents#

credit.datasets._utils._STRFTIME_TO_REGEX: dict[str, str]#
credit.datasets._utils._STRFTIME_TO_FREQ: list[tuple[str, str]] = [('%S', 's'), ('%M', 'min'), ('%H', 'h'), ('%j', 'D'), ('%d', 'D'), ('%m', 'M')]#
credit.datasets._utils._strftime_to_regex(fmt: str) re.Pattern#

Convert a strftime format string to a compiled regex.

The returned pattern matches the date substring in a filename; use m.group(0) together with the original fmt and strptime to recover the datetime.

Parameters:

fmt – strftime format string (e.g. "%Y", "%Y%m%d-%H%M%S").

Returns:

Compiled regex pattern matching the date portion of a filename.

credit.datasets._utils._infer_period_freq(fmt: str) str#

Return the finest pd.Period frequency implied by a strftime format.

Parameters:

fmt – strftime format string.

Returns:

pd.Period frequency string (e.g. "h", "D", "M", "Y").

credit.datasets._utils._map_files(file_list: list[str], time_fmt: str) list[tuple[pandas.Timestamp, pandas.Timestamp, str]]#

Build a sorted list of (start, end, path) intervals.

For a single file the interval covers all representable time so no date parsing is attempted. For multiple files, time_fmt (a strftime format string) is used to extract the date from each filename’s basename; pd.Period then determines the exact coverage window.

Parameters:
  • file_list – Sorted list of file paths returned by glob.

  • time_fmt – strftime format string, e.g. "%Y", "%Y%m%d-%H%M%S".

Returns:

List of (start, end, path) tuples sorted by start time.

Raises:

ValueError – If time_fmt does not match the basename of any file in file_list.

credit.datasets._utils._find_file(intervals: list[tuple[pandas.Timestamp, pandas.Timestamp, str]], t: pandas.Timestamp) str#

Binary-search for the file whose interval covers t.

Parameters:
  • intervals – Sorted list of (start, end, path) tuples.

  • t – Timestamp to look up.

Returns:

Path to the file covering t.

Raises:

KeyError – If no interval covers t.

credit.datasets._utils._to_cftime(ts: pandas.Timestamp, calendar: str) cftime.datetime#

Convert a pandas Timestamp to a cftime.datetime.

Parameters:
  • ts – Pandas Timestamp to convert.

  • calendar – cftime calendar string read from the dataset (e.g. "noleap", "gregorian", "proleptic_gregorian").

Returns:

cftime.datetime with the specified calendar.

credit.datasets._utils._start_s3_fs() s3fs.S3FileSystem#

Lazily initialize an anonymous s3fs.S3FileSystem instance.

Called automatically on the first __extract_field__ (called within __getitem__) invocation when mode is "remote". The filesystem object is cached in _fs for re-use across later calls.