credit.datasets.hrrr_download#
Standalone utility for downloading HRRR prs GRIB2 data from AWS S3 to local disk.
Downloads are embarrassingly parallel: each timestamp is an independent task
dispatched to a multiprocessing.Pool. Both the grib2 file and its .idx
sidecar are downloaded so that HRRRDataset in local mode can use byte-range
reads rather than scanning the full file.
Downloaded files follow the native HRRR directory layout used by HRRRDataset
in local mode, so they are immediately usable without any renaming:
v3/v4 (2018-07-12+): {base_path}/hrrr.{YYYYMMDD}/conus/hrrr.t{HH}z.{product}f{FF:02d}.grib2
v1/v2 (before): {base_path}/hrrr.{YYYYMMDD}/hrrr.t{HH}z.{product}f{FF:02d}.grib2
After downloading, switch mode to "local" in the config.
Usage:
python -m credit.datasets.hrrr_download -c config/my_conf.yaml --num-workers 8
Or programmatically:
from credit.datasets.hrrr_download import download_hrrr
download_hrrr(config['data'], num_workers=8, overwrite=False)
Config section used (data.source):
data:
source:
Example_HRRR:
dataset_type: "hrrr"
product: "wrfprs" # Options: "wrfprs", "wrfnat", "wrfsubh"
mode: "local" # mode to use after download
base_path: "/data/hrrr"
forecast_hour: 0
start_datetime: "2022-01-01"
end_datetime: "2022-01-31"
timestep: "1h"
forecast_len: 0
Attributes#
Classes#
Simple struct for downloading HRRR data |
Functions#
|
Download one grib2 + .idx pair. Returns a status string for logging. |
|
Download HRRR grib2 + .idx files from AWS S3 to local disk. |
Module Contents#
- credit.datasets.hrrr_download.logger#
- class credit.datasets.hrrr_download._DownloadTask#
Bases:
NamedTupleSimple struct for downloading HRRR data
- Parameters:
NamedTuple – Lightweight immutable struct for named parameters.
- s3_uri: str#
- local_path: str#
- overwrite: bool#
- credit.datasets.hrrr_download._download_one(task: _DownloadTask) str#
Download one grib2 + .idx pair. Returns a status string for logging.
Runs in a worker process — imports s3fs locally so the pool workers don’t need to inherit an open filesystem object from the parent.
- Parameters:
task (_DownloadTask) – Specifications for HRRR download (see _DownloadTask).
- Returns:
- Status string indicating the result of the download attempt, formatted as:
”ok {local_path}” if the file was successfully downloaded.
”skip {local_path}” if the file already exists and overwrite is False.
”miss {s3_uri}” if the file was not found on S3
- Return type:
str
- credit.datasets.hrrr_download.download_hrrr(data_config: dict[str, Any], num_workers: int = 4, overwrite: bool = False) None#
Download HRRR grib2 + .idx files from AWS S3 to local disk.
Each timestamp is downloaded in parallel using a
multiprocessing.Pool. Both the grib2 file and its.idxsidecar are fetched so thatHRRRDatasetinmode: "local"can use fast byte-range reads.- Parameters:
data_config (dict[str, Any]) – Top-level
dataconfig dict (same object passed toHRRRDataset).num_workers (int, optional) – Number of parallel download workers. Each worker opens its own
s3fsconnection. Default4.overwrite (bool, optional) – Re-download files that already exist on disk. Default
False(skip existing files).
- Raises:
ImportError – If
s3fsis not installed.KeyError – If the config is missing required fields.
ValueError – If product is not a recognised HRRR product.
- credit.datasets.hrrr_download.parser#