Forecast API Server#

CREDIT ships a lightweight FastAPI server that loads a trained model once and serves autoregressive forecasts over HTTP. It is designed for scenarios where you want to run forecasts on demand without the overhead of submitting a PBS job each time — for example, a demo service, a shared inference node, or a container in a Kubernetes cluster.

Quick start#

# Install the extra dependencies
pip install miles-credit[serve]    # fastapi + uvicorn

# Point at your config and launch
export CREDIT_CONFIG=/path/to/my_run.yml
uvicorn applications.api:app --host 0.0.0.0 --port 8000

The server loads the model and all normalisation statistics at startup and keeps them in GPU memory. Each /forecast request runs the rollout and writes output NetCDF files to disk.

Endpoints#

GET /health#

Returns 200 immediately. Use this as your liveness/readiness probe.

curl http://localhost:8000/health
{"status": "ok", "model_loaded": true, "device": "cuda:0"}

model_loaded is false until the lifespan startup finishes (model weights are on disk — this can take 10–30 s).


POST /forecast#

Run an autoregressive forecast.

Request body (JSON):

Field

Type

Default

Description

init_time

string

required

ISO init time, e.g. "2024-01-15T00"

steps

int

40

Number of autoregressive steps

save_dir

string

from config

Directory to write output NetCDF files

save_workers

int

4

CPU workers for async NetCDF writes

Response:

{
  "status": "ok",
  "init_time": "2024-01-15T00Z",
  "steps": 40,
  "lead_time_hours": 240,
  "save_dir": "/path/to/output"
}

Output files follow the same naming convention as credit realtime:

<save_dir>/<YYYY-MM-DDTHH>Z/pred_<YYYY-MM-DDTHH>Z_<FHR:03d>.nc

Example:

curl -X POST http://localhost:8000/forecast \
    -H "Content-Type: application/json" \
    -d '{"init_time": "2024-01-15T00", "steps": 40}'
# Custom output directory
curl -X POST http://localhost:8000/forecast \
    -H "Content-Type: application/json" \
    -d '{"init_time": "2024-01-15T00", "steps": 40, "save_dir": "/scratch/me/forecasts"}'

Interactive docs#

FastAPI generates interactive API docs automatically. With the server running, open your browser at:

  • Swagger UI: http://localhost:8000/docs

  • ReDoc: http://localhost:8000/redoc


Configuration#

The server reads a single CREDIT v2 YAML config set via the CREDIT_CONFIG environment variable. All model, data, and normalisation settings come from that file — no extra config needed.

export CREDIT_CONFIG=/glade/work/$USER/my_run/config.yml
uvicorn applications.api:app --host 0.0.0.0 --port 8000

The config must have predict.mode: none (single-GPU inference). Multi-GPU DDP/FSDP serving is not supported via the API; use credit rollout for that.


Deployment notes#

Workers: always use --workers 1. The model is loaded into GPU memory once at startup; multiple workers would each load their own copy and quickly exhaust VRAM.

Timeouts: requests block until the rollout finishes. A 40-step 1-degree rollout takes roughly 30–60 s on an A100. Set your client and reverse-proxy timeouts accordingly (e.g. --timeout-keep-alive 300 for uvicorn behind nginx).

GPU: the server automatically uses cuda:0 if a GPU is available, otherwise falls back to CPU (much slower — not recommended for production).

On NCAR clusters: run the server on a login node or an interactive Casper job. Do not run long-lived servers inside PBS batch jobs.

Docker / Kubernetes: see the Quickstart for the path to containerisation. The server is the natural target for a Kubernetes Deployment with a GPU node selector and a liveness probe pointed at /health.