benchmark_parallelism
=====================

.. py:module:: benchmark_parallelism

.. autoapi-nested-parse::

   Parallelism benchmark for CREDIT v2 WXFormer.

   Measures step time (ms), peak GPU memory (GB), and throughput (samples/sec)
   using synthetic data. No real ERA5 data required.

   Usage (torchrun):
       torchrun --standalone --nproc-per-node=4 applications/benchmark_parallelism.py         -c config/gen_2/smoke/fsdp2_parallel_test.yml         [--data fsdp2] [--tensor 2] [--domain 1]         [--warmup 5] [--steps 20]

   Output: one TSV row per rank-0 with:
       config_name  dp  tp  domain  world_size  step_ms  peak_mem_gb  samples_per_sec



Functions
---------

.. autoapisummary::

   benchmark_parallelism.parse_args
   benchmark_parallelism.make_synthetic_input
   benchmark_parallelism.main


Module Contents
---------------

.. py:function:: parse_args()

.. py:function:: make_synthetic_input(conf, device)

   Build a random input tensor matching the model's expected channels/size.


.. py:function:: main()

