benchmark_parallelism#

Parallelism benchmark for CREDIT v2 WXFormer.

Measures step time (ms), peak GPU memory (GB), and throughput (samples/sec) using synthetic data. No real ERA5 data required.

Usage (torchrun):

torchrun –standalone –nproc-per-node=4 applications/benchmark_parallelism.py -c config/gen_2/smoke/fsdp2_parallel_test.yml [–data fsdp2] [–tensor 2] [–domain 1] [–warmup 5] [–steps 20]

Output: one TSV row per rank-0 with:

config_name dp tp domain world_size step_ms peak_mem_gb samples_per_sec

Functions#

parse_args()

make_synthetic_input(conf, device)

Build a random input tensor matching the model's expected channels/size.

main()

Module Contents#

benchmark_parallelism.parse_args()#
benchmark_parallelism.make_synthetic_input(conf, device)#

Build a random input tensor matching the model’s expected channels/size.

benchmark_parallelism.main()#