benchmark_parallelism#
Parallelism benchmark for CREDIT v2 WXFormer.
Measures step time (ms), peak GPU memory (GB), and throughput (samples/sec) using synthetic data. No real ERA5 data required.
- Usage (torchrun):
torchrun –standalone –nproc-per-node=4 applications/benchmark_parallelism.py -c config/gen_2/smoke/fsdp2_parallel_test.yml [–data fsdp2] [–tensor 2] [–domain 1] [–warmup 5] [–steps 20]
- Output: one TSV row per rank-0 with:
config_name dp tp domain world_size step_ms peak_mem_gb samples_per_sec
Functions#
|
Build a random input tensor matching the model's expected channels/size. |
|
Module Contents#
- benchmark_parallelism.parse_args()#
- benchmark_parallelism.make_synthetic_input(conf, device)#
Build a random input tensor matching the model’s expected channels/size.
- benchmark_parallelism.main()#