credit.datasets._utils#
Shared file-mapping helpers for ERA5 and MRMS dataset classes.
Provides strftime-based filename parsing and binary-search timestamp-to-file lookup, supporting any temporal file granularity (annual, monthly, daily, etc.).
Attributes#
Functions#
|
Convert a strftime format string to a compiled regex. |
|
Return the finest |
|
Build a sorted list of |
|
Binary-search for the file whose interval covers t. |
|
Convert a pandas Timestamp to a cftime.datetime. |
|
Lazily initialize an anonymous |
Module Contents#
- credit.datasets._utils._STRFTIME_TO_REGEX: dict[str, str]#
- credit.datasets._utils._STRFTIME_TO_FREQ: list[tuple[str, str]] = [('%S', 's'), ('%M', 'min'), ('%H', 'h'), ('%j', 'D'), ('%d', 'D'), ('%m', 'M')]#
- credit.datasets._utils._strftime_to_regex(fmt: str) re.Pattern#
Convert a strftime format string to a compiled regex.
The returned pattern matches the date substring in a filename; use
m.group(0)together with the original fmt andstrptimeto recover the datetime.- Parameters:
fmt – strftime format string (e.g.
"%Y","%Y%m%d-%H%M%S").- Returns:
Compiled regex pattern matching the date portion of a filename.
- credit.datasets._utils._infer_period_freq(fmt: str) str#
Return the finest
pd.Periodfrequency implied by a strftime format.- Parameters:
fmt – strftime format string.
- Returns:
pd.Period frequency string (e.g.
"h","D","M","Y").
- credit.datasets._utils._map_files(file_list: list[str], time_fmt: str) list[tuple[pandas.Timestamp, pandas.Timestamp, str]]#
Build a sorted list of
(start, end, path)intervals.For a single file the interval covers all representable time so no date parsing is attempted. For multiple files, time_fmt (a strftime format string) is used to extract the date from each filename’s basename;
pd.Periodthen determines the exact coverage window.- Parameters:
file_list – Sorted list of file paths returned by glob.
time_fmt – strftime format string, e.g.
"%Y","%Y%m%d-%H%M%S".
- Returns:
List of
(start, end, path)tuples sorted by start time.- Raises:
ValueError – If time_fmt does not match the basename of any file in file_list.
- credit.datasets._utils._find_file(intervals: list[tuple[pandas.Timestamp, pandas.Timestamp, str]], t: pandas.Timestamp) str#
Binary-search for the file whose interval covers t.
- Parameters:
intervals – Sorted list of
(start, end, path)tuples.t – Timestamp to look up.
- Returns:
Path to the file covering t.
- Raises:
KeyError – If no interval covers t.
- credit.datasets._utils._to_cftime(ts: pandas.Timestamp, calendar: str) cftime.datetime#
Convert a pandas Timestamp to a cftime.datetime.
- Parameters:
ts – Pandas Timestamp to convert.
calendar – cftime calendar string read from the dataset (e.g.
"noleap","gregorian","proleptic_gregorian").
- Returns:
cftime.datetime with the specified calendar.
- credit.datasets._utils._start_s3_fs() s3fs.S3FileSystem#
Lazily initialize an anonymous
s3fs.S3FileSysteminstance.Called automatically on the first
__extract_field__(called within__getitem__) invocation whenmodeis"remote". The filesystem object is cached in_fsfor re-use across later calls.