credit.trainers#
Submodules#
- credit.trainers.base_trainer
- credit.trainers.ic_optimization
- credit.trainers.trainerERA5
- credit.trainers.trainerERA5_Diffusion
- credit.trainers.trainerERA5_ensemble
- credit.trainers.trainerLES
- credit.trainers.trainerWRF
- credit.trainers.trainerWRF_multi
- credit.trainers.trainer_downscaling
- credit.trainers.trainer_om4_samudra
- credit.trainers.utils
Attributes#
Classes#
Helper class that provides a standard way to create an ABC using |
|
Helper class that provides a standard way to create an ABC using |
|
Helper class that provides a standard way to create an ABC using |
|
Helper class that provides a standard way to create an ABC using |
|
Helper class that provides a standard way to create an ABC using |
|
Helper class that provides a standard way to create an ABC using |
|
Helper class that provides a standard way to create an ABC using |
|
Helper class that provides a standard way to create an ABC using |
|
Trainer class for handling the training, validation, and checkpointing of models. |
Functions#
|
Package Contents#
- class credit.trainers.TrainerERA5(model: torch.nn.Module, rank: int)#
Bases:
credit.trainers.base_trainer.BaseTrainerHelper class that provides a standard way to create an ABC using inheritance.
- train_one_epoch(epoch, conf, trainloader, optimizer, criterion, scaler, scheduler, metrics)#
Trains the model for one epoch.
- Parameters:
epoch (int) – Current epoch number.
conf (dict) – Configuration dictionary containing training settings.
trainloader (DataLoader) – DataLoader for the training dataset.
optimizer (torch.optim.Optimizer) – Optimizer used for training.
criterion (callable) – Loss function used for training.
scaler (torch.cuda.amp.GradScaler) – Gradient scaler for mixed precision training.
scheduler (torch.optim.lr_scheduler._LRScheduler) – Learning rate scheduler.
metrics (callable) – Function to compute metrics for evaluation.
- Returns:
Dictionary containing training metrics and loss for the epoch.
- Return type:
dict
- validate(epoch, conf, valid_loader, criterion, metrics)#
Validates the model on the validation dataset.
- Parameters:
epoch (int) – Current epoch number.
conf (dict) – Configuration dictionary containing validation settings.
valid_loader (DataLoader) – DataLoader for the validation dataset.
criterion (callable) – Loss function used for validation.
metrics (callable) – Function to compute metrics for evaluation.
- Returns:
Dictionary containing validation metrics and loss for the epoch.
- Return type:
dict
- class credit.trainers.TrainerERA5_Diffusion(model: torch.nn.Module, rank: int)#
Bases:
credit.trainers.base_trainer.BaseTrainerHelper class that provides a standard way to create an ABC using inheritance.
- train_one_epoch(epoch, conf, trainloader, optimizer, criterion, scaler, scheduler, metrics)#
Trains the model for one epoch.
- Parameters:
epoch (int) – Current epoch number.
conf (dict) – Configuration dictionary containing training settings.
trainloader (DataLoader) – DataLoader for the training dataset.
optimizer (torch.optim.Optimizer) – Optimizer used for training.
criterion (callable) – Loss function used for training.
scaler (torch.cuda.amp.GradScaler) – Gradient scaler for mixed precision training.
scheduler (torch.optim.lr_scheduler._LRScheduler) – Learning rate scheduler.
metrics (callable) – Function to compute metrics for evaluation.
- Returns:
Dictionary containing training metrics and loss for the epoch.
- Return type:
dict
- validate(epoch, conf, valid_loader, criterion, metrics)#
Validates the model on the validation dataset.
- Parameters:
epoch (int) – Current epoch number.
conf (dict) – Configuration dictionary containing validation settings.
valid_loader (DataLoader) – DataLoader for the validation dataset.
criterion (callable) – Loss function used for validation.
metrics (callable) – Function to compute metrics for evaluation.
- Returns:
Dictionary containing validation metrics and loss for the epoch.
- Return type:
dict
- class credit.trainers.TrainerEnsemble(model: torch.nn.Module, rank: int)#
Bases:
credit.trainers.base_trainer.BaseTrainerHelper class that provides a standard way to create an ABC using inheritance.
- train_one_epoch(epoch, conf, trainloader, optimizer, criterion, scaler, scheduler, metrics)#
Trains the model for one epoch.
- Parameters:
epoch (int) – Current epoch number.
conf (dict) – Configuration dictionary containing training settings.
trainloader (DataLoader) – DataLoader for the training dataset.
optimizer (torch.optim.Optimizer) – Optimizer used for training.
criterion (callable) – Loss function used for training.
scaler (torch.cuda.amp.GradScaler) – Gradient scaler for mixed precision training.
scheduler (torch.optim.lr_scheduler._LRScheduler) – Learning rate scheduler.
metrics (callable) – Function to compute metrics for evaluation.
- Returns:
Dictionary containing training metrics and loss for the epoch.
- Return type:
dict
- validate(epoch, conf, valid_loader, criterion, metrics)#
Validates the model on the validation dataset.
- Parameters:
epoch (int) – Current epoch number.
conf (dict) – Configuration dictionary containing validation settings.
valid_loader (DataLoader) – DataLoader for the validation dataset.
criterion (callable) – Loss function used for validation.
metrics (callable) – Function to compute metrics for evaluation.
- Returns:
Dictionary containing validation metrics and loss for the epoch.
- Return type:
dict
- class credit.trainers.Trainer404(model: torch.nn.Module, rank: int)#
Bases:
credit.trainers.base_trainer.BaseTrainerHelper class that provides a standard way to create an ABC using inheritance.
- setup(conf)#
- train_one_epoch(epoch, conf, trainloader, optimizer, criterion, scaler, scheduler, metrics)#
Train the model for one epoch.
- Parameters:
epoch (int) – The current epoch number.
conf (Dict[str, Any]) – The configuration dictionary.
trainloader (torch.utils.data.DataLoader) – The training data loader.
optimizer (torch.optim.Optimizer) – The optimizer.
criterion (torch.nn.Module) – The loss function.
scaler (torch.cuda.amp.GradScaler) – The gradient scaler for mixed precision training.
scheduler (torch.optim.lr_scheduler.LRScheduler) – The learning rate scheduler.
metrics (Dict[str, Any]) – The metrics to track during training.
- Returns:
A dictionary containing the training results.
- Return type:
Dict[str, float]
- validate(epoch, conf, valid_loader, criterion, metrics)#
Validates the model on the validation dataset.
- Parameters:
epoch (int) – Current epoch number.
conf (dict) – Configuration dictionary containing validation settings.
valid_loader (DataLoader) – DataLoader for the validation dataset.
criterion (callable) – Loss function used for validation.
metrics (callable) – Function to compute metrics for evaluation.
- Returns:
Dictionary containing validation metrics and loss for the epoch.
- Return type:
dict
- class credit.trainers.TrainerIC(model: torch.nn.Module, rank: int)#
Bases:
credit.trainers.base_trainer.BaseTrainerHelper class that provides a standard way to create an ABC using inheritance.
- train_one_epoch(epoch, conf, trainloader, optimizer, criterion, scaler, scheduler, metrics)#
Trains the model for one epoch.
- Parameters:
epoch (int) – Current epoch number.
conf (dict) – Configuration dictionary containing training settings.
trainloader (DataLoader) – DataLoader for the training dataset.
optimizer (torch.optim.Optimizer) – Optimizer used for training.
criterion (callable) – Loss function used for training.
scaler (torch.cuda.amp.GradScaler) – Gradient scaler for mixed precision training.
scheduler (torch.optim.lr_scheduler._LRScheduler) – Learning rate scheduler.
metrics (callable) – Function to compute metrics for evaluation.
- Returns:
Dictionary containing training metrics and loss for the epoch.
- Return type:
dict
- validate(epoch, conf, valid_loader, criterion, metrics)#
Validates the model on the validation dataset.
- Parameters:
epoch (int) – Current epoch number.
conf (dict) – Configuration dictionary containing validation settings.
valid_loader (DataLoader) – DataLoader for the validation dataset.
criterion (callable) – Loss function used for validation.
metrics (callable) – Function to compute metrics for evaluation.
- Returns:
Dictionary containing validation metrics and loss for the epoch.
- Return type:
dict
- class credit.trainers.TrainerSamudra(model: torch.nn.Module, rank: int)#
Bases:
credit.trainers.base_trainer.BaseTrainerHelper class that provides a standard way to create an ABC using inheritance.
- train_one_epoch(epoch, conf, trainloader, optimizer, criterion, scaler, scheduler, metrics)#
Trains the model for one epoch.
- Parameters:
epoch (int) – Current epoch number.
conf (dict) – Configuration dictionary containing training settings.
trainloader (DataLoader) – DataLoader for the training dataset.
optimizer (torch.optim.Optimizer) – Optimizer used for training.
criterion (callable) – Loss function used for training.
scaler (torch.cuda.amp.GradScaler) – Gradient scaler for mixed precision training.
scheduler (torch.optim.lr_scheduler._LRScheduler) – Learning rate scheduler.
metrics (callable) – Function to compute metrics for evaluation.
- Returns:
Dictionary containing training metrics and loss for the epoch.
- Return type:
dict
- validate(epoch, conf, valid_loader, criterion, metrics)#
Validates the model on the validation dataset.
- Parameters:
epoch (int) – Current epoch number.
conf (dict) – Configuration dictionary containing validation settings.
valid_loader (DataLoader) – DataLoader for the validation dataset.
criterion (callable) – Loss function used for validation.
metrics (callable) – Function to compute metrics for evaluation.
- Returns:
Dictionary containing validation metrics and loss for the epoch.
- Return type:
dict
- class credit.trainers.TrainerLES(model: torch.nn.Module, rank: int)#
Bases:
credit.trainers.base_trainer.BaseTrainerHelper class that provides a standard way to create an ABC using inheritance.
- train_one_epoch(epoch, conf, trainloader, optimizer, criterion, scaler, scheduler, metrics)#
Train the model for one epoch.
- Parameters:
epoch (int) – The current epoch number.
conf (Dict[str, Any]) – The configuration dictionary.
trainloader (torch.utils.data.DataLoader) – The training data loader.
optimizer (torch.optim.Optimizer) – The optimizer.
criterion (torch.nn.Module) – The loss function.
scaler (torch.cuda.amp.GradScaler) – The gradient scaler for mixed precision training.
scheduler (torch.optim.lr_scheduler.LRScheduler) – The learning rate scheduler.
metrics (Dict[str, Any]) – The metrics to track during training.
- Returns:
A dictionary containing the training results.
- Return type:
Dict[str, float]
- validate(epoch, conf, valid_loader, criterion, metrics)#
Validate the model on the validation set.
- Parameters:
epoch (int) – The current epoch number.
conf (Dict[str, Any]) – The configuration dictionary.
valid_loader (torch.utils.data.DataLoader) – The validation data loader.
criterion (torch.nn.Module) – The loss function.
metrics (Dict[str, Any]) – The metrics to track during validation.
- Returns:
A dictionary containing the validation results.
- Return type:
Dict[str, float]
- class credit.trainers.TrainerWRF(model: torch.nn.Module, rank: int)#
Bases:
credit.trainers.base_trainer.BaseTrainerHelper class that provides a standard way to create an ABC using inheritance.
- train_one_epoch(epoch, conf, trainloader, optimizer, criterion, scaler, scheduler, metrics)#
Train the model for one epoch.
- Parameters:
epoch (int) – The current epoch number.
conf (Dict[str, Any]) – The configuration dictionary.
trainloader (torch.utils.data.DataLoader) – The training data loader.
optimizer (torch.optim.Optimizer) – The optimizer.
criterion (torch.nn.Module) – The loss function.
scaler (torch.cuda.amp.GradScaler) – The gradient scaler for mixed precision training.
scheduler (torch.optim.lr_scheduler.LRScheduler) – The learning rate scheduler.
metrics (Dict[str, Any]) – The metrics to track during training.
- Returns:
A dictionary containing the training results.
- Return type:
Dict[str, float]
- validate(epoch, conf, valid_loader, criterion, metrics)#
Validate the model on the validation set.
- Parameters:
epoch (int) – The current epoch number.
conf (Dict[str, Any]) – The configuration dictionary.
valid_loader (torch.utils.data.DataLoader) – The validation data loader.
criterion (torch.nn.Module) – The loss function.
metrics (Dict[str, Any]) – The metrics to track during validation.
- Returns:
A dictionary containing the validation results.
- Return type:
Dict[str, float]
- class credit.trainers.TrainerWRFMulti(model: torch.nn.Module, rank: int)#
Bases:
credit.trainers.base_trainer.BaseTrainerTrainer class for handling the training, validation, and checkpointing of models.
This class is responsible for executing the training loop, validating the model on a separate dataset, and managing checkpoints during training. It supports both single-GPU and distributed (FSDP, DDP) training.
- model#
The model to be trained.
- Type:
torch.nn.Module
- rank#
The rank of the process in distributed training.
- Type:
int
- module#
If True, use model with module parallelism (default: False).
- Type:
bool
- train_one_epoch(epoch, conf, trainloader, optimizer, criterion, scaler,
scheduler, metrics):
Perform training for one epoch and return training metrics.
- validate(epoch, conf, valid_loader, criterion, metrics)#
Validate the model on the validation dataset and return validation metrics.
- fit_deprecated(conf, train_loader, valid_loader, optimizer, train_criterion,
valid_criterion, scaler, scheduler, metrics, trial=False):
Perform the full training loop across multiple epochs, including validation and checkpointing.
- train_one_epoch(epoch, conf, trainloader, optimizer, criterion, scaler, scheduler, metrics)#
Trains the model for one epoch.
- Parameters:
epoch (int) – Current epoch number.
conf (dict) – Configuration dictionary containing training settings.
trainloader (DataLoader) – DataLoader for the training dataset.
optimizer (torch.optim.Optimizer) – Optimizer used for training.
criterion (callable) – Loss function used for training.
scaler (torch.cuda.amp.GradScaler) – Gradient scaler for mixed precision training.
scheduler (torch.optim.lr_scheduler._LRScheduler) – Learning rate scheduler.
metrics (callable) – Function to compute metrics for evaluation.
- Returns:
Dictionary containing training metrics and loss for the epoch.
- Return type:
dict
- validate(epoch, conf, valid_loader, criterion, metrics)#
Validates the model on the validation dataset.
- Parameters:
epoch (int) – Current epoch number.
conf (dict) – Configuration dictionary containing validation settings.
valid_loader (DataLoader) – DataLoader for the validation dataset.
criterion (callable) – Loss function used for validation.
metrics (callable) – Function to compute metrics for evaluation.
- Returns:
Dictionary containing validation metrics and loss for the epoch.
- Return type:
dict
- credit.trainers.logger#
- credit.trainers.trainer_types#
- credit.trainers.load_trainer(conf)#