credit.datasets.hrrr_download#

Standalone utility for downloading HRRR prs GRIB2 data from AWS S3 to local disk.

Downloads are embarrassingly parallel: each timestamp is an independent task dispatched to a multiprocessing.Pool. Both the grib2 file and its .idx sidecar are downloaded so that HRRRDataset in local mode can use byte-range reads rather than scanning the full file.

Downloaded files follow the native HRRR directory layout used by HRRRDataset in local mode, so they are immediately usable without any renaming:

v3/v4 (2018-07-12+): {base_path}/hrrr.{YYYYMMDD}/conus/hrrr.t{HH}z.{product}f{FF:02d}.grib2
v1/v2 (before):      {base_path}/hrrr.{YYYYMMDD}/hrrr.t{HH}z.{product}f{FF:02d}.grib2

After downloading, switch mode to "local" in the config.

Usage:

python -m credit.datasets.hrrr_download -c config/my_conf.yaml --num-workers 8

Or programmatically:

from credit.datasets.hrrr_download import download_hrrr
download_hrrr(config['data'], num_workers=8, overwrite=False)

Config section used (data.source):

data:
  source:
    Example_HRRR:
      dataset_type: "hrrr"
      product: "wrfprs" # Options: "wrfprs", "wrfnat", "wrfsubh"
      mode: "local"          # mode to use after download
      base_path: "/data/hrrr"
      forecast_hour: 0
  start_datetime: "2022-01-01"
  end_datetime:   "2022-01-31"
  timestep:       "1h"
  forecast_len:   0

Attributes#

Classes#

_DownloadTask

Simple struct for downloading HRRR data

Functions#

_download_one(→ str)

Download one grib2 + .idx pair. Returns a status string for logging.

download_hrrr(→ None)

Download HRRR grib2 + .idx files from AWS S3 to local disk.

Module Contents#

credit.datasets.hrrr_download.logger#
class credit.datasets.hrrr_download._DownloadTask#

Bases: NamedTuple

Simple struct for downloading HRRR data

Parameters:

NamedTuple – Lightweight immutable struct for named parameters.

s3_uri: str#
local_path: str#
overwrite: bool#
credit.datasets.hrrr_download._download_one(task: _DownloadTask) str#

Download one grib2 + .idx pair. Returns a status string for logging.

Runs in a worker process — imports s3fs locally so the pool workers don’t need to inherit an open filesystem object from the parent.

Parameters:

task (_DownloadTask) – Specifications for HRRR download (see _DownloadTask).

Returns:

Status string indicating the result of the download attempt, formatted as:
  • ”ok {local_path}” if the file was successfully downloaded.

  • ”skip {local_path}” if the file already exists and overwrite is False.

  • ”miss {s3_uri}” if the file was not found on S3

Return type:

str

credit.datasets.hrrr_download.download_hrrr(data_config: dict[str, Any], num_workers: int = 4, overwrite: bool = False) None#

Download HRRR grib2 + .idx files from AWS S3 to local disk.

Each timestamp is downloaded in parallel using a multiprocessing.Pool. Both the grib2 file and its .idx sidecar are fetched so that HRRRDataset in mode: "local" can use fast byte-range reads.

Parameters:
  • data_config (dict[str, Any]) – Top-level data config dict (same object passed to HRRRDataset).

  • num_workers (int, optional) – Number of parallel download workers. Each worker opens its own s3fs connection. Default 4.

  • overwrite (bool, optional) – Re-download files that already exist on disk. Default False (skip existing files).

Raises:
  • ImportError – If s3fs is not installed.

  • KeyError – If the config is missing required fields.

  • ValueError – If product is not a recognised HRRR product.

credit.datasets.hrrr_download.parser#