Installing CREDIT from source#

If you want to take advantage of the full power of CREDIT, which includes scaling training across multiple nodes, you will need to build PyTorch from source. The instructions for building PyTorch from source can leave out important details if you want proper CUDA, numpy, and MPI support, so follow along here as we take you on the journey to CREDIT.

Prerequisites#

In this section, you will need to create a clean Python virtual environment and install other system packages to enable the building of PyTorch. You can even do this on your Mac or Linux laptop.

MacOS (Intel or ARM)#

  1. First, install either the Homebrew or MacPorts package managers so you can easily install all the system-level dependencies and other helpful Linux programs like wget, gcc, etc. Further instructions will assume the use of Homebrew, but should also work for MacPorts. Homebrew should also install the XCode Command Line tools, which include git and clang.

  2. Install the following programs with the brew install <program> command.

    • gcc (gnu compilers for C, C++, and Fortran)

    • netcdf (for reading and writing netCDF files)

    • wget (for downloading files from the internet)

    • mpich or openmpi (for MPI distributed training support)

Linux#

  1. Install the following dependencies with your OS package manager or ask your sysadmins to make sure they are installed:

    • gcc (gnu compilers for C, C++, and Fortran)

    • netcdf (for reading and writing netCDF files)

    • wget (for downloading files from the internet)

    • mpich, openmpi, or cray-mpi (for MPI distributed training support)

  2. Install a recent version of the CUDA toolkit, cuDNN, and NCCL (needed for multi-GPU and distributed training).

  3. If you plan to do multi-node distributed training with CREDIT, you will also need to install libfabric and the aws-ofi-nccl plugin to interface between libfabric and NCCL. If you do not compile PyTorch with Libfabric and aws-ofi-nccl, we have found that inter-node communication speeds on Derecho and other HPE/Cray HPC systems are much slower.

NSF NCAR Derecho#

The NSF NCAR Derecho system has some special requirements in place to support Cray MPI and the Slingshot interconnect. Ben Kirk has created a special makefile to build PyTorch and Torchvision from source on Derecho with all appropriate environment variables and dependencies. Please follow instructions there if you wish to build your own version of PyTorch from source. Otherwise, Conda packages for PyTorch 2.5.1 and Torchvision 0.20.1 are available on Derecho at /glade/work/benkirk/consulting/conda-recipes/output/ and are used when running create_env_derecho.sh.

PyTorch Python Dependencies#

  1. Install a Python virtual environment manager. My preferred one for now is miniforge, which has both conda and mamba included and does not have the licensing issues that may come with miniconda. uv might also work but has not been tested.

  2. Create a clean virtual environment with the following command: mamba create -n credit python=3.12

  3. Activate your environment. conda activate credit

  4. Clone a release branch of PyTorch and all its third party dependencies. This will take awhile. git clone -b v2.6.0 --recursive git@github.com:pytorch/pytorch.git

  5. Go into the PyTorch repo and install its dependencies into your credit environment.

cd pytorch
pip install -r requirements.txt

Building PyTorch#

  1. Build and install PyTorch into your environment with the following command:

USE_DISTRIBUTED=1 python setup.py install

On Mac systems, the USE_DISTRIBUTED option is 0 by default and is 1 on Linux systems.

If you want to build PyTorch in a way that can be distributed to multiple users, build a wheel file. Wheel files are binary libraries that can be pip installed into other users’ environments on the same operating system.

pip install wheel build
cd pytorch # if not already in your pytorch directory
python -m build --wheel

With either build approach, PyTorch performs a bunch of configuration checks to know what files to compile for a given setup. Before letting it compile, which can take at least 1 hour, review key configuration settings in the summary. Make sure USE_DISTRIBUTED=1, that numpy is included, and if on a Linux cluster, that CUDA and NCCL have been found and are being used.

Once a wheel file is created, you can install it into a Python environment with pip install <pytorch wheel file>

Building Torchvision#

CREDIT also uses torchvision, which also needs to be built and compiled to match with your installed version of PyTorch. Instructions for building torchvision from source can be found here.

Installing CREDIT#

Clone miles-credit from github and install using pip:

git clone https://github.com:NCAR/miles-credit.git
cd miles-credit
pip install -e .

The CREDIT installer should be able to install all dependencies from PyPI using pip.

Test Your Installation#

To verify that you have installed all dependencies correctly, you can conduct the following tests. First, install and run pytest to verify all unit tests pass.

pip install pytest
cd miles-credit
pytest .

If all tests pass, run credit_train config/example-v2026.1.0.yml on a test case.