AI Assistant (credit ask)#
credit ask is a unified AI assistant built into CREDIT. It automatically runs in the
best mode available based on your API keys:
Agent mode (when
ANTHROPIC_API_KEYis set) — multi-turn agentic loop that reads your files, inspects logs, and runs shell commands before answering.Simple chat (fallback when Anthropic is unavailable) — one-shot Q&A using Groq, Gemini, OpenAI, or Anthropic Haiku, whichever key is set.
Quick start#
pip install "miles-credit[ask]"
# Free — no card needed, works immediately
export GROQ_API_KEY=gsk_... # https://console.groq.com
# Or use your own Anthropic key to unlock agent mode (~$0.01–0.05/session)
export ANTHROPIC_API_KEY=sk-ant-... # https://console.anthropic.com
credit ask "why did my last training run crash?"
credit ask -c config.yml "is my learning rate too high for 0.25 degree?"
Note: A Claude.ai Pro subscription does not include API access — billing is separate at console.anthropic.com/settings/billing. A typical session costs $0.01–0.05. See Cost for details.
NCAR users: institutional API access is in development. Check back soon.
Two modes, one command#
credit ask automatically picks the best mode for your setup:
Agent mode (Anthropic)#
When ANTHROPIC_API_KEY is set and the anthropic package is installed, credit ask
runs a multi-turn agentic loop:
Your question
↓
Agent decides what to look at
↓
Reads PBS log → finds CUDA OOM traceback
↓
Reads config → sees train_batch_size: 8
↓
Reads source → confirms memory layout for 0.25° × 18 levels
↓
Answer: "Reduce batch size to 4 or enable amp: True …"
It keeps going — reading more files, running more commands — until it has enough information to give you a specific, actionable answer.
Simple chat (fallback)#
When Anthropic is unavailable, credit ask falls back to one-shot Q&A using whichever
provider key is set. Provider priority (first found wins):
Provider |
Env var |
Model |
Cost |
|---|---|---|---|
Anthropic |
|
Claude Haiku |
Pay-per-use (NCAR: shared) |
OpenAI |
|
GPT-4o |
Pay-per-use |
|
Gemini 1.5 Pro |
Free for NCAR via AI Studio |
|
Groq |
|
Llama 3 Instant |
Free tier (no card needed) |
Simple chat injects your config, training log, and most recent PBS output as context
when you pass -c.
Installation and setup#
1. Install the package#
pip install "miles-credit[ask]"
2. Get an API key#
Free option — no account required beyond a free Groq signup:
export GROQ_API_KEY=gsk_... # https://console.groq.com (free tier, no card needed)
Anthropic (enables full agent mode):
Sign up or log in at console.anthropic.com
Go to API Keys → Create Key
Add credits at Settings → Billing (pay-as-you-go, no subscription required)
export ANTHROPIC_API_KEY=sk-ant-...
# Persist across sessions
echo 'export ANTHROPIC_API_KEY=sk-ant-...' >> ~/.bashrc
NCAR users: institutional API access is in development. Check back soon.
Usage reference#
credit ask [-c CONFIG] [--max-turns N] [--provider PROVIDER] QUESTION
Argument |
Description |
|---|---|
|
Your question or task in plain English |
|
Path to your run’s YAML config — agent gets your config, training log, and most recent PBS output as starting context |
|
Stop after N agentic turns (default: 20). Only applies in agent mode. |
|
Force a specific provider for simple chat: |
Example sessions#
Diagnose a training crash#
credit ask -c config/wxformer_1dg_6hr_v2.yml "why did my training run crash?"
The agent will:
Read your config to find
save_locGlob for PBS output files (
*.o*) and read the most recent oneLocate the traceback
If it’s an OOM, read your config’s batch size, model dimensions, and
ampsettingReturn a specific fix: e.g. “reduce
train_batch_sizefrom 8 to 4, or setamp: True”
Check job queue and walltime#
credit ask "how many of my jobs are queued on Derecho, and when does the running one expire?"
The agent runs qstat -u $USER, parses the output, and tells you exactly what’s running,
how much walltime remains, and whether anything is stuck in queue.
Config review before a long run#
credit ask -c config/big_run.yml \
"I'm about to start a 200-epoch run on 8 H100s. Review my config for anything that would waste compute or cause it to fail."
The agent reads your full config and cross-references it with the source code to flag issues —
wrong num_epoch vs epochs ratio, missing save_best_weights, use_scheduler: False with
a large run, etc.
Understand source code#
credit ask "walk me through how apply_preblocks assembles the batch tensor — what goes into x and what goes into y?"
The agent reads credit/preblock/__init__.py and the relevant trainer code and gives you a
plain-English explanation with line references.
Compare two configs#
credit ask "compare config/run_a.yml and config/run_b.yml and explain every difference"
The agent reads both files and produces a structured diff with explanations of what each difference means for training behaviour.
Debug a data loading hang#
credit ask -c config.yml "my training job starts but then hangs and never prints a loss — what's wrong?"
The agent checks your thread_workers, prefetch_factor, and dataset size, estimates
DataLoader memory usage, and flags if you’re likely hitting an OOM or deadlock.
What the agent can access#
In agent mode the assistant has three tools. All are read-only — it cannot modify, delete, or move files, and cannot submit or cancel jobs.
read_file#
Reads any file you have filesystem access to. Returns up to 400 lines from the end of the
file by default (configurable via the tail parameter the agent chooses internally).
Best used for: configs, PBS output logs, Python tracebacks, source files, checkpoint metadata.
list_files#
Glob-style file discovery. The agent uses this to find your PBS logs, locate configs, or discover checkpoint directories.
Examples it might run internally:
*.o* → find PBS output files in current directory
logs/**/*.txt → find all log files recursively
save_loc/** → find checkpoints for your run
bash#
Runs read-only shell commands with a 30-second timeout. Permitted commands include:
Command |
What it’s used for |
|---|---|
|
Check job queue status |
|
Search files for patterns |
|
Read the end/start of large files |
|
Locate files by name or modification time |
|
Inspect repo history |
|
List files, count lines |
|
Compare two files |
The following are blocked regardless of how they’re phrased:
rm, mv, cp, git push, git reset, git checkout, qdel, scancel, kill,
pip install, conda install, sudo, and any output-redirect operators (>, >>).
Tips for best results#
Give it your config with -c. Without it the agent has to search for context; with it
the agent starts with your full run setup and gets to the answer faster.
Be specific about what went wrong. “it crashed” forces the agent to explore; “it crashed with CUDA OOM at epoch 3” lets it skip the discovery phase and go straight to solutions.
For source code questions, name the thing. “how does apply_preblocks work?” is better than “how does the data pipeline work?” because the agent can immediately read the right module.
Use --max-turns for very complex tasks. The default of 20 is enough for most
debugging sessions. For a full config audit or multi-file code review, --max-turns 40
gives the agent more room.
Cost#
Agent mode uses claude-sonnet-4-6 — Anthropic’s mid-tier model that balances
capability with cost.
Session type |
Typical turns |
Approximate cost |
|---|---|---|
Simple Q&A (no files needed) |
1–2 |
< $0.005 |
Diagnose a crash (read log + config) |
3–6 |
$0.01–0.02 |
Full config review |
5–10 |
$0.02–0.05 |
Multi-file code investigation |
10–20 |
$0.05–0.15 |
Simple chat (Haiku/Groq/Gemini) costs significantly less or nothing.
Pricing is based on Anthropic’s published input/output token rates. A full PBS output log is typically 5,000–20,000 tokens; a config is 1,000–3,000 tokens.
Troubleshooting#
Anthropic API key has no credits
Add credits at console.anthropic.com/settings/billing.
API credits are separate from a Claude.ai subscription.
credit ask will automatically fall back to simple chat providers if Anthropic credits run out.
No API key found
# Free option — no card needed:
export GROQ_API_KEY=gsk_... # https://console.groq.com
# Or use your own Anthropic key for agent mode:
export ANTHROPIC_API_KEY=sk-ant-... # https://console.anthropic.com
anthropic package required
pip install "miles-credit[ask]"
Agent gives a generic answer and doesn’t read files
Make sure you’re passing -c config.yml so it has a starting point. You can also be explicit:
credit ask "read the most recent *.o* file in this directory and tell me if there are any errors"
Agent hits max_turns without finishing
Increase the limit:
credit ask --max-turns 40 -c config.yml "do a full audit of my training setup"