credit.datasets.hrrr_download
=============================

.. py:module:: credit.datasets.hrrr_download

.. autoapi-nested-parse::

   hrrr_download.py
   ----------------
   Standalone utility for downloading HRRR prs GRIB2 data from AWS S3 to local disk.

   Downloads are embarrassingly parallel: each timestamp is an independent task
   dispatched to a ``multiprocessing.Pool``.  Both the grib2 file and its ``.idx``
   sidecar are downloaded so that ``HRRRDataset`` in local mode can use byte-range
   reads rather than scanning the full file.

   Downloaded files follow the native HRRR directory layout used by ``HRRRDataset``
   in local mode, so they are immediately usable without any renaming::

       v3/v4 (2018-07-12+): {base_path}/hrrr.{YYYYMMDD}/conus/hrrr.t{HH}z.{product}f{FF:02d}.grib2
       v1/v2 (before):      {base_path}/hrrr.{YYYYMMDD}/hrrr.t{HH}z.{product}f{FF:02d}.grib2

   After downloading, switch ``mode`` to ``"local"`` in the config.

   Usage::

       python -m credit.datasets.hrrr_download -c config/my_conf.yaml --num-workers 8

   Or programmatically::

       from credit.datasets.hrrr_download import download_hrrr
       download_hrrr(config['data'], num_workers=8, overwrite=False)

   Config section used (``data.source``)::

       data:
         source:
           Example_HRRR:
             dataset_type: "hrrr"
             product: "wrfprs" # Options: "wrfprs", "wrfnat", "wrfsubh"
             mode: "local"          # mode to use after download
             base_path: "/data/hrrr"
             forecast_hour: 0
         start_datetime: "2022-01-01"
         end_datetime:   "2022-01-31"
         timestep:       "1h"
         forecast_len:   0


Attributes
----------

.. autoapisummary::

   credit.datasets.hrrr_download.logger
   credit.datasets.hrrr_download.parser


Classes
-------

.. autoapisummary::

   credit.datasets.hrrr_download._DownloadTask


Functions
---------

.. autoapisummary::

   credit.datasets.hrrr_download._download_one
   credit.datasets.hrrr_download.download_hrrr


Module Contents
---------------

.. py:data:: logger

.. py:class:: _DownloadTask

   Bases: :py:obj:`NamedTuple`


   Simple struct for downloading HRRR data

   :param NamedTuple: Lightweight immutable struct for named parameters.


   .. py:attribute:: s3_uri
      :type:  str


   .. py:attribute:: local_path
      :type:  str


   .. py:attribute:: overwrite
      :type:  bool


.. py:function:: _download_one(task: _DownloadTask) -> str

   Download one grib2 + .idx pair.  Returns a status string for logging.

   Runs in a worker process — imports s3fs locally so the pool workers don't
   need to inherit an open filesystem object from the parent.

   :param task: Specifications for HRRR download (see _DownloadTask).
   :type task: _DownloadTask

   :returns:

             Status string indicating the result of the download attempt, formatted as:
                 - "ok    {local_path}" if the file was successfully downloaded.
                 - "skip  {local_path}" if the file already exists and overwrite is False.
                 - "miss  {s3_uri}" if the file was not found on S3
   :rtype: str


.. py:function:: download_hrrr(data_config: dict[str, Any], num_workers: int = 4, overwrite: bool = False) -> None

   Download HRRR grib2 + .idx files from AWS S3 to local disk.

   Each timestamp is downloaded in parallel using a ``multiprocessing.Pool``.
   Both the grib2 file and its ``.idx`` sidecar are fetched so that
   ``HRRRDataset`` in ``mode: "local"`` can use fast byte-range reads.

   :param data_config: Top-level ``data`` config dict (same object passed to
                       ``HRRRDataset``).
   :type data_config: dict[str, Any]
   :param num_workers: Number of parallel download workers.  Each worker opens
                       its own ``s3fs`` connection.  Default ``4``.
   :type num_workers: int, optional
   :param overwrite: Re-download files that already exist on disk. Default
                     ``False`` (skip existing files).
   :type overwrite: bool, optional

   :raises ImportError: If ``s3fs`` is not installed.
   :raises KeyError: If the config is missing required fields.
   :raises ValueError: If *product* is not a recognised HRRR product.


.. py:data:: parser