credit.samplers#
Attributes#
Classes#
Base class for all Samplers. |
|
Sampler that restricts data loading to a subset of the dataset. |
Module Contents#
- credit.samplers.logger#
- class credit.samplers.MultiStepBatchSamplerSubset(dataset: torch.utils.data.Dataset, batch_size: int, index_subset, num_forecast_steps: int)#
Bases:
torch.utils.data.SamplerBase class for all Samplers.
Every Sampler subclass has to provide an
__iter__()method, providing a way to iterate over indices or lists of indices (batches) of dataset elements, and may provide a__len__()method that returns the length of the returned iterators.Example
>>> # xdoctest: +SKIP >>> class AccedingSequenceLengthSampler(Sampler[int]): >>> def __init__(self, data: List[str]) -> None: >>> self.data = data >>> >>> def __len__(self) -> int: >>> return len(self.data) >>> >>> def __iter__(self) -> Iterator[int]: >>> sizes = torch.tensor([len(x) for x in self.data]) >>> yield from torch.argsort(sizes).tolist() >>> >>> class AccedingSequenceLengthBatchSampler(Sampler[List[int]]): >>> def __init__(self, data: List[str], batch_size: int) -> None: >>> self.data = data >>> self.batch_size = batch_size >>> >>> def __len__(self) -> int: >>> return (len(self.data) + self.batch_size - 1) // self.batch_size >>> >>> def __iter__(self) -> Iterator[List[int]]: >>> sizes = torch.tensor([len(x) for x in self.data]) >>> for batch in torch.chunk(torch.argsort(sizes), len(self)): >>> yield batch.tolist()
Note
The
__len__()method isn’t strictly required byDataLoader, but is expected in any calculation involving the length of aDataLoader.- dataset#
- num_forecast_steps#
- init_times#
- dt#
- index_subset#
- batch_size#
- num_start_batches#
- __len__()#
- __iter__()#
- class credit.samplers.DistributedMultiStepBatchSampler(dataset: torch.utils.data.Dataset, batch_size: int, num_forecast_steps: int, num_replicas: int | None = None, rank: int | None = None, shuffle: bool = True, seed: int = 0, drop_last: bool = False)#
Bases:
torch.utils.data.DistributedSamplerSampler that restricts data loading to a subset of the dataset.
It is especially useful in conjunction with
torch.nn.parallel.DistributedDataParallel. In such a case, each process can pass aDistributedSamplerinstance as aDataLoadersampler, and load a subset of the original dataset that is exclusive to it.Note
Dataset is assumed to be of constant size and that any instance of it always returns the same elements in the same order.
- Parameters:
dataset – Dataset used for sampling.
num_replicas (int, optional) – Number of processes participating in distributed training. By default,
world_sizeis retrieved from the current distributed group.rank (int, optional) – Rank of the current process within
num_replicas. By default,rankis retrieved from the current distributed group.shuffle (bool, optional) – If
True(default), sampler will shuffle the indices.seed (int, optional) – random seed used to shuffle the sampler if
shuffle=True. This number should be identical across all processes in the distributed group. Default:0.drop_last (bool, optional) – if
True, then the sampler will drop the tail of the data to make it evenly divisible across the number of replicas. IfFalse, the sampler will add extra indices to make the data evenly divisible across the replicas. Default:False.
Warning
In distributed mode, calling the
set_epoch()method at the beginning of each epoch before creating theDataLoaderiterator is necessary to make shuffling work properly across multiple epochs. Otherwise, the same ordering will be always used.Example:
>>> # xdoctest: +SKIP >>> sampler = DistributedSampler(dataset) if is_distributed else None >>> loader = DataLoader(dataset, shuffle=(sampler is None), ... sampler=sampler) >>> for epoch in range(start_epoch, n_epochs): ... if is_distributed: ... sampler.set_epoch(epoch) ... train(loader)
- batch_size#
- num_forecast_steps#
- __iter__()#
- __len__() int#