credit.models.swin#
Attributes#
Classes#
This class implements window-based Multi-Head-Attention with log-spaced continuous position bias. |
|
This class implements window-based Multi-Head-Attention with log-spaced continuous position bias. |
|
This class implements the Swin transformer block. |
|
This class implements the patch merging as a strided convolution with a normalization before. |
|
2D Image to Patch Embedding |
|
This class implements a stage of the Swin transformer including multiple layers. |
|
Swin Transformer V2 |
Functions#
|
|
|
|
|
Permutes a tensor from the shape (B, C, H, W) to (B, H, W, C). |
|
Permutes a tensor from the shape (B, H, W, C) to (B, C, H, W). |
|
|
|
|
|
|
|
|
|
Module Contents#
- credit.models.swin.logger#
- credit.models.swin.apply_spectral_norm(model)#
- credit.models.swin.circular_pad1d(x, pad)#
- credit.models.swin.bchw_to_bhwc(x: torch.Tensor) torch.Tensor#
Permutes a tensor from the shape (B, C, H, W) to (B, H, W, C).
- credit.models.swin.bhwc_to_bchw(x: torch.Tensor) torch.Tensor#
Permutes a tensor from the shape (B, H, W, C) to (B, C, H, W).
- credit.models.swin.swin_from_yaml(fname, checkpoint_stages=False)#
- credit.models.swin.swinv2net(params, checkpoint_stages=False)#
- credit.models.swin.window_partition(x, window_size: Tuple[int, int])#
- Parameters:
x – (B, H, W, C)
window_size (int) – window size
- Returns:
(num_windows*B, window_size, window_size, C)
- Return type:
windows
- credit.models.swin.window_reverse(windows, window_size: Tuple[int, int], img_size: Tuple[int, int])#
- Parameters:
windows – (num_windows * B, window_size[0], window_size[1], C)
window_size (Tuple[int, int]) – Window size
img_size (Tuple[int, int]) – Image size
- Returns:
(B, H, W, C)
- Return type:
x
- class credit.models.swin.WindowMultiHeadAttentionNoPos(dim: int, num_heads: int, window_size: Tuple[int, int], drop_attn: float = 0.0, drop_proj: float = 0.0, sequential_attn: bool = False)#
Bases:
torch.nn.ModuleThis class implements window-based Multi-Head-Attention with log-spaced continuous position bias.
- Parameters:
dim (int) – Number of input features
window_size (int) – Window size
num_heads (int) – Number of attention heads
drop_attn (float) – Dropout rate of attention map
drop_proj (float) – Dropout rate after projection
meta_hidden_dim (int) – Number of hidden features in the two layer MLP meta network
sequential_attn (bool) – If true sequential self-attention is performed
- in_features: int#
- window_size: Tuple[int, int]#
- num_heads: int#
- sequential_attn: bool = False#
- qkv#
- attn_drop#
- proj#
- proj_drop#
- logit_scale#
- update_input_size(new_window_size: int, **kwargs: Any) None#
Method updates the window size and so the pair-wise relative positions
- Parameters:
new_window_size (int) – New window size
kwargs (Any) – Unused
- forward(x: torch.Tensor, mask: torch.Tensor | None = None) torch.Tensor#
Forward pass. :param x: Input tensor of the shape (B * windows, N, C) :type x: torch.Tensor :param mask: Attention mask for the shift case :type mask: Optional[torch.Tensor]
- Returns:
Output tensor of the shape [B * windows, N, C]
- class credit.models.swin.WindowMultiHeadAttention(dim: int, num_heads: int, window_size: Tuple[int, int], drop_attn: float = 0.0, drop_proj: float = 0.0, meta_hidden_dim: int = 384, sequential_attn: bool = False)#
Bases:
torch.nn.ModuleThis class implements window-based Multi-Head-Attention with log-spaced continuous position bias.
- Parameters:
dim (int) – Number of input features
window_size (int) – Window size
num_heads (int) – Number of attention heads
drop_attn (float) – Dropout rate of attention map
drop_proj (float) – Dropout rate after projection
meta_hidden_dim (int) – Number of hidden features in the two layer MLP meta network
sequential_attn (bool) – If true sequential self-attention is performed
- in_features: int#
- window_size: Tuple[int, int]#
- num_heads: int#
- sequential_attn: bool = False#
- qkv#
- attn_drop#
- proj#
- proj_drop#
- meta_mlp#
- logit_scale#
- _make_pair_wise_relative_positions() None#
Method initializes the pair-wise relative positions to compute the positional biases.
- update_input_size(new_window_size: int, **kwargs: Any) None#
Method updates the window size and so the pair-wise relative positions
- Parameters:
new_window_size (int) – New window size
kwargs (Any) – Unused
- _relative_positional_encodings() torch.Tensor#
Method computes the relative positional encodings
- Returns:
Relative positional encodings (1, number of heads, window size ** 2, window size ** 2)
- Return type:
relative_position_bias (torch.Tensor)
- forward(x: torch.Tensor, mask: torch.Tensor | None = None) torch.Tensor#
Forward pass. :param x: Input tensor of the shape (B * windows, N, C) :type x: torch.Tensor :param mask: Attention mask for the shift case :type mask: Optional[torch.Tensor]
- Returns:
Output tensor of the shape [B * windows, N, C]
- class credit.models.swin.SwinTransformerV2CrBlock(dim: int, num_heads: int, feat_size: Tuple[int, int], window_size: Tuple[int, int], shift_size: Tuple[int, int] = (0, 0), mlp_ratio: float = 4.0, init_values: float | None = 0, proj_drop: float = 0.0, drop_attn: float = 0.0, drop_path: float = 0.0, extra_norm: bool = False, sequential_attn: bool = False, norm_layer: Type[torch.nn.Module] = nn.LayerNorm, rel_pos: bool = True)#
Bases:
torch.nn.ModuleThis class implements the Swin transformer block.
- Parameters:
dim (int) – Number of input channels
num_heads (int) – Number of attention heads to be utilized
feat_size (Tuple[int, int]) – Input resolution
window_size (Tuple[int, int]) – Window size to be utilized
shift_size (int) – Shifting size to be used
mlp_ratio (int) – Ratio of the hidden dimension in the FFN to the input channels
proj_drop (float) – Dropout in input mapping
drop_attn (float) – Dropout rate of attention map
drop_path (float) – Dropout in main path
extra_norm (bool) – Insert extra norm on ‘main’ branch if True
sequential_attn (bool) – If true sequential self-attention is performed
norm_layer (Type[nn.Module]) – Type of normalization layer to be utilized
- dim: int#
- feat_size: Tuple[int, int]#
- target_shift_size: Tuple[int, int] = (0, 0)#
- window_area#
- init_values: float | None = 0#
- attn#
- norm1#
- drop_path1#
- mlp#
- norm2#
- drop_path2#
- norm3#
- _calc_window_shift(target_window_size)#
- _make_attention_mask() None#
Method generates the attention mask used in shift case.
- init_weights()#
- update_input_size(new_window_size: Tuple[int, int], new_feat_size: Tuple[int, int]) None#
Method updates the image resolution to be processed and window size and so the pair-wise relative positions.
- Parameters:
new_window_size (int) – New window size
new_feat_size (Tuple[int, int]) – New input resolution
- _shifted_window_attn(x)#
- class credit.models.swin.PatchMerging(dim: int, norm_layer: Type[torch.nn.Module] = nn.LayerNorm)#
Bases:
torch.nn.ModuleThis class implements the patch merging as a strided convolution with a normalization before. :param dim: Number of input channels :type dim: int :param norm_layer: Type of normalization layer to be utilized. :type norm_layer: Type[nn.Module]
- norm#
- reduction#
- class credit.models.swin.PatchEmbed(img_size=224, patch_size=16, in_chans=3, embed_dim=768, norm_layer=None)#
Bases:
torch.nn.Module2D Image to Patch Embedding
- img_size#
- patch_size#
- grid_size#
- num_patches#
- proj#
- norm#
- forward(x)#
- class credit.models.swin.SwinTransformerV2CrStage(embed_dim: int, depth: int, downscale: bool, num_heads: int, feat_size: Tuple[int, int], window_size: Tuple[int, int], mlp_ratio: float = 4.0, init_values: float | None = 0.0, proj_drop: float = 0.0, drop_attn: float = 0.0, drop_path: List[float] | float = 0.0, norm_layer: Type[torch.nn.Module] = nn.LayerNorm, extra_norm_period: int = 0, extra_norm_stage: bool = False, sequential_attn: bool = False, rel_pos: bool = True, grad_checkpointing: bool = False)#
Bases:
torch.nn.ModuleThis class implements a stage of the Swin transformer including multiple layers.
- Parameters:
embed_dim (int) – Number of input channels
depth (int) – Depth of the stage (number of layers)
downscale (bool) – If true input is downsampled (see Fig. 3 or V1 paper)
feat_size (Tuple[int, int]) – input feature map size (H, W)
num_heads (int) – Number of attention heads to be utilized
window_size (int) – Window size to be utilized
mlp_ratio (int) – Ratio of the hidden dimension in the FFN to the input channels
proj_drop (float) – Dropout in input mapping
drop_attn (float) – Dropout rate of attention map
drop_path (float) – Dropout in main path
norm_layer (Type[nn.Module]) – Type of normalization layer to be utilized. Default: nn.LayerNorm
extra_norm_period (int) – Insert extra norm layer on main branch every N (period) blocks
extra_norm_stage (bool) – End each stage with an extra norm layer in main branch
sequential_attn (bool) – If true sequential self-attention is performed
- downscale: bool#
- feat_size: Tuple[int, int]#
- grad_checkpointing = False#
- blocks#
- update_input_size(new_window_size: int, new_feat_size: Tuple[int, int]) None#
Method updates the resolution to utilize and the window size and so the pair-wise relative positions.
- Parameters:
new_window_size (int) – New window size
new_feat_size (Tuple[int, int]) – New input resolution
- class credit.models.swin.SwinTransformerV2Cr(img_size: Tuple[int, int] = (224, 224), patch_size: int = 4, window_size: int | None = None, img_window_ratio: int = 32, channels: int = 4, levels: int = 15, surface_channels: int = 7, input_only_channels: int = 3, output_only_channels: int = 0, frames: int = 1, embed_dim: int = 96, depths: Tuple[int, Ellipsis] = (2, 2, 6, 2), num_heads: Tuple[int, Ellipsis] = (3, 6, 12, 24), mlp_ratio: float = 4.0, init_values: float | None = 0.0, drop_rate: float = 0.0, proj_drop_rate: float = 0.0, attn_drop_rate: float = 0.0, drop_path_rate: float = 0.0, norm_layer: Type[torch.nn.Module] = nn.LayerNorm, extra_norm_period: int = 0, extra_norm_stage: bool = False, sequential_attn: bool = False, global_pool: str = 'avg', weight_init='skip', full_pos_embed: bool = False, rel_pos: bool = True, checkpoint_stages: bool = False, residual: bool = False, use_spectral_norm: bool = False, padding_conf: dict = None, post_conf: dict = None, **kwargs: Any)#
Bases:
credit.models.base_model.BaseModel- Swin Transformer V2
- A PyTorch impl ofSwin Transformer V2: Scaling Up Capacity and Resolution -
- Parameters:
img_size – Input resolution.
window_size – Window size. If None, img_size // window_div
img_window_ratio – Window size to image size ratio.
patch_size – Patch size.
in_chans – Number of input channels.
depths – Depth of the stage (number of layers).
num_heads – Number of attention heads to be utilized.
embed_dim – Patch embedding dimension.
num_classes – Number of output classes.
mlp_ratio – Ratio of the hidden dimension in the FFN to the input channels.
drop_rate – Dropout rate.
proj_drop_rate – Projection dropout rate.
attn_drop_rate – Dropout rate of attention map.
drop_path_rate – Stochastic depth rate.
norm_layer – Type of normalization layer to be utilized.
extra_norm_period – Insert extra norm layer on main branch every N (period) blocks in stage
extra_norm_stage – End each stage with an extra norm layer in main branch
sequential_attn – If true sequential self-attention is performed.
padding_conf (dict) – padding configuration
post_conf (dict) – configuration for postblock processing
- use_padding#
- patch_size: int = 4#
- img_size: Tuple[int, int] = (224, 224)#
- window_size: int#
- num_features: int = 96#
- frames = 1#
- in_chans = 70#
- out_chans = 67#
- feature_info = []#
- full_pos_embed = False#
- checkpoint_stages = False#
- residual = False#
- depth#
- use_post_block#
- patch_embed#
- stages#
- head#
- use_spectral_norm = False#
- forward_features(x: torch.Tensor) torch.Tensor#
- forward_head(x: torch.Tensor) torch.Tensor#
- forward(x: torch.Tensor) torch.Tensor#
- update_input_size(new_img_size: Tuple[int, int] | None = None, new_window_size: int | None = None, img_window_ratio: int = 32) None#
Method updates the image resolution to be processed and window size and so the pair-wise relative positions.
- Parameters:
new_window_size (Optional[int]) – New window size, if None based on new_img_size // window_div
new_img_size (Optional[Tuple[int, int]]) – New input resolution, if None current resolution is used
img_window_ratio (int) – divisor for calculating window size from image size
- group_matcher(coarse=False)#
- set_grad_checkpointing(enable=True)#
- get_classifier() torch.nn.Module#
Method returns the classification head of the model. :returns: Current classification head :rtype: head (nn.Module)
- reset_classifier(num_classes: int, global_pool: str | None = None) None#
Method results the classification head
- Parameters:
num_classes (int) – Number of classes to be predicted
global_pool (str) – Unused
- credit.models.swin.init_weights(module: torch.nn.Module, name: str = '')#
- credit.models.swin.image_height = 640#