credit.models.swin

Contents

credit.models.swin#

Attributes#

Classes#

WindowMultiHeadAttentionNoPos

This class implements window-based Multi-Head-Attention with log-spaced continuous position bias.

WindowMultiHeadAttention

This class implements window-based Multi-Head-Attention with log-spaced continuous position bias.

SwinTransformerV2CrBlock

This class implements the Swin transformer block.

PatchMerging

This class implements the patch merging as a strided convolution with a normalization before.

PatchEmbed

2D Image to Patch Embedding

SwinTransformerV2CrStage

This class implements a stage of the Swin transformer including multiple layers.

SwinTransformerV2Cr

Swin Transformer V2

Functions#

apply_spectral_norm(model)

circular_pad1d(x, pad)

bchw_to_bhwc(→ torch.Tensor)

Permutes a tensor from the shape (B, C, H, W) to (B, H, W, C).

bhwc_to_bchw(→ torch.Tensor)

Permutes a tensor from the shape (B, H, W, C) to (B, C, H, W).

swin_from_yaml(fname[, checkpoint_stages])

swinv2net(params[, checkpoint_stages])

window_partition(x, window_size)

window_reverse(windows, window_size, img_size)

init_weights(module[, name])

Module Contents#

credit.models.swin.logger#
credit.models.swin.apply_spectral_norm(model)#
credit.models.swin.circular_pad1d(x, pad)#
credit.models.swin.bchw_to_bhwc(x: torch.Tensor) torch.Tensor#

Permutes a tensor from the shape (B, C, H, W) to (B, H, W, C).

credit.models.swin.bhwc_to_bchw(x: torch.Tensor) torch.Tensor#

Permutes a tensor from the shape (B, H, W, C) to (B, C, H, W).

credit.models.swin.swin_from_yaml(fname, checkpoint_stages=False)#
credit.models.swin.swinv2net(params, checkpoint_stages=False)#
credit.models.swin.window_partition(x, window_size: Tuple[int, int])#
Parameters:
  • x – (B, H, W, C)

  • window_size (int) – window size

Returns:

(num_windows*B, window_size, window_size, C)

Return type:

windows

credit.models.swin.window_reverse(windows, window_size: Tuple[int, int], img_size: Tuple[int, int])#
Parameters:
  • windows – (num_windows * B, window_size[0], window_size[1], C)

  • window_size (Tuple[int, int]) – Window size

  • img_size (Tuple[int, int]) – Image size

Returns:

(B, H, W, C)

Return type:

x

class credit.models.swin.WindowMultiHeadAttentionNoPos(dim: int, num_heads: int, window_size: Tuple[int, int], drop_attn: float = 0.0, drop_proj: float = 0.0, sequential_attn: bool = False)#

Bases: torch.nn.Module

This class implements window-based Multi-Head-Attention with log-spaced continuous position bias.

Parameters:
  • dim (int) – Number of input features

  • window_size (int) – Window size

  • num_heads (int) – Number of attention heads

  • drop_attn (float) – Dropout rate of attention map

  • drop_proj (float) – Dropout rate after projection

  • meta_hidden_dim (int) – Number of hidden features in the two layer MLP meta network

  • sequential_attn (bool) – If true sequential self-attention is performed

in_features: int#
window_size: Tuple[int, int]#
num_heads: int#
sequential_attn: bool = False#
qkv#
attn_drop#
proj#
proj_drop#
logit_scale#
update_input_size(new_window_size: int, **kwargs: Any) None#

Method updates the window size and so the pair-wise relative positions

Parameters:
  • new_window_size (int) – New window size

  • kwargs (Any) – Unused

forward(x: torch.Tensor, mask: torch.Tensor | None = None) torch.Tensor#

Forward pass. :param x: Input tensor of the shape (B * windows, N, C) :type x: torch.Tensor :param mask: Attention mask for the shift case :type mask: Optional[torch.Tensor]

Returns:

Output tensor of the shape [B * windows, N, C]

class credit.models.swin.WindowMultiHeadAttention(dim: int, num_heads: int, window_size: Tuple[int, int], drop_attn: float = 0.0, drop_proj: float = 0.0, meta_hidden_dim: int = 384, sequential_attn: bool = False)#

Bases: torch.nn.Module

This class implements window-based Multi-Head-Attention with log-spaced continuous position bias.

Parameters:
  • dim (int) – Number of input features

  • window_size (int) – Window size

  • num_heads (int) – Number of attention heads

  • drop_attn (float) – Dropout rate of attention map

  • drop_proj (float) – Dropout rate after projection

  • meta_hidden_dim (int) – Number of hidden features in the two layer MLP meta network

  • sequential_attn (bool) – If true sequential self-attention is performed

in_features: int#
window_size: Tuple[int, int]#
num_heads: int#
sequential_attn: bool = False#
qkv#
attn_drop#
proj#
proj_drop#
meta_mlp#
logit_scale#
_make_pair_wise_relative_positions() None#

Method initializes the pair-wise relative positions to compute the positional biases.

update_input_size(new_window_size: int, **kwargs: Any) None#

Method updates the window size and so the pair-wise relative positions

Parameters:
  • new_window_size (int) – New window size

  • kwargs (Any) – Unused

_relative_positional_encodings() torch.Tensor#

Method computes the relative positional encodings

Returns:

Relative positional encodings (1, number of heads, window size ** 2, window size ** 2)

Return type:

relative_position_bias (torch.Tensor)

forward(x: torch.Tensor, mask: torch.Tensor | None = None) torch.Tensor#

Forward pass. :param x: Input tensor of the shape (B * windows, N, C) :type x: torch.Tensor :param mask: Attention mask for the shift case :type mask: Optional[torch.Tensor]

Returns:

Output tensor of the shape [B * windows, N, C]

class credit.models.swin.SwinTransformerV2CrBlock(dim: int, num_heads: int, feat_size: Tuple[int, int], window_size: Tuple[int, int], shift_size: Tuple[int, int] = (0, 0), mlp_ratio: float = 4.0, init_values: float | None = 0, proj_drop: float = 0.0, drop_attn: float = 0.0, drop_path: float = 0.0, extra_norm: bool = False, sequential_attn: bool = False, norm_layer: Type[torch.nn.Module] = nn.LayerNorm, rel_pos: bool = True)#

Bases: torch.nn.Module

This class implements the Swin transformer block.

Parameters:
  • dim (int) – Number of input channels

  • num_heads (int) – Number of attention heads to be utilized

  • feat_size (Tuple[int, int]) – Input resolution

  • window_size (Tuple[int, int]) – Window size to be utilized

  • shift_size (int) – Shifting size to be used

  • mlp_ratio (int) – Ratio of the hidden dimension in the FFN to the input channels

  • proj_drop (float) – Dropout in input mapping

  • drop_attn (float) – Dropout rate of attention map

  • drop_path (float) – Dropout in main path

  • extra_norm (bool) – Insert extra norm on ‘main’ branch if True

  • sequential_attn (bool) – If true sequential self-attention is performed

  • norm_layer (Type[nn.Module]) – Type of normalization layer to be utilized

dim: int#
feat_size: Tuple[int, int]#
target_shift_size: Tuple[int, int] = (0, 0)#
window_area#
init_values: float | None = 0#
attn#
norm1#
drop_path1#
mlp#
norm2#
drop_path2#
norm3#
_calc_window_shift(target_window_size)#
_make_attention_mask() None#

Method generates the attention mask used in shift case.

init_weights()#
update_input_size(new_window_size: Tuple[int, int], new_feat_size: Tuple[int, int]) None#

Method updates the image resolution to be processed and window size and so the pair-wise relative positions.

Parameters:
  • new_window_size (int) – New window size

  • new_feat_size (Tuple[int, int]) – New input resolution

_shifted_window_attn(x)#
forward(x: torch.Tensor) torch.Tensor#

Forward pass.

Parameters:

x (torch.Tensor) – Input tensor of the shape [B, C, H, W]

Returns:

Output tensor of the shape [B, C, H, W]

Return type:

output (torch.Tensor)

class credit.models.swin.PatchMerging(dim: int, norm_layer: Type[torch.nn.Module] = nn.LayerNorm)#

Bases: torch.nn.Module

This class implements the patch merging as a strided convolution with a normalization before. :param dim: Number of input channels :type dim: int :param norm_layer: Type of normalization layer to be utilized. :type norm_layer: Type[nn.Module]

norm#
reduction#
forward(x: torch.Tensor) torch.Tensor#

Forward pass. :param x: Input tensor of the shape [B, C, H, W] :type x: torch.Tensor

Returns:

Output tensor of the shape [B, 2 * C, H // 2, W // 2]

Return type:

output (torch.Tensor)

class credit.models.swin.PatchEmbed(img_size=224, patch_size=16, in_chans=3, embed_dim=768, norm_layer=None)#

Bases: torch.nn.Module

2D Image to Patch Embedding

img_size#
patch_size#
grid_size#
num_patches#
proj#
norm#
forward(x)#
class credit.models.swin.SwinTransformerV2CrStage(embed_dim: int, depth: int, downscale: bool, num_heads: int, feat_size: Tuple[int, int], window_size: Tuple[int, int], mlp_ratio: float = 4.0, init_values: float | None = 0.0, proj_drop: float = 0.0, drop_attn: float = 0.0, drop_path: List[float] | float = 0.0, norm_layer: Type[torch.nn.Module] = nn.LayerNorm, extra_norm_period: int = 0, extra_norm_stage: bool = False, sequential_attn: bool = False, rel_pos: bool = True, grad_checkpointing: bool = False)#

Bases: torch.nn.Module

This class implements a stage of the Swin transformer including multiple layers.

Parameters:
  • embed_dim (int) – Number of input channels

  • depth (int) – Depth of the stage (number of layers)

  • downscale (bool) – If true input is downsampled (see Fig. 3 or V1 paper)

  • feat_size (Tuple[int, int]) – input feature map size (H, W)

  • num_heads (int) – Number of attention heads to be utilized

  • window_size (int) – Window size to be utilized

  • mlp_ratio (int) – Ratio of the hidden dimension in the FFN to the input channels

  • proj_drop (float) – Dropout in input mapping

  • drop_attn (float) – Dropout rate of attention map

  • drop_path (float) – Dropout in main path

  • norm_layer (Type[nn.Module]) – Type of normalization layer to be utilized. Default: nn.LayerNorm

  • extra_norm_period (int) – Insert extra norm layer on main branch every N (period) blocks

  • extra_norm_stage (bool) – End each stage with an extra norm layer in main branch

  • sequential_attn (bool) – If true sequential self-attention is performed

downscale: bool#
feat_size: Tuple[int, int]#
grad_checkpointing = False#
blocks#
update_input_size(new_window_size: int, new_feat_size: Tuple[int, int]) None#

Method updates the resolution to utilize and the window size and so the pair-wise relative positions.

Parameters:
  • new_window_size (int) – New window size

  • new_feat_size (Tuple[int, int]) – New input resolution

forward(x: torch.Tensor) torch.Tensor#

Forward pass. :param x: Input tensor of the shape [B, C, H, W] or [B, L, C] :type x: torch.Tensor

Returns:

Output tensor of the shape [B, 2 * C, H // 2, W // 2]

Return type:

output (torch.Tensor)

class credit.models.swin.SwinTransformerV2Cr(img_size: Tuple[int, int] = (224, 224), patch_size: int = 4, window_size: int | None = None, img_window_ratio: int = 32, channels: int = 4, levels: int = 15, surface_channels: int = 7, input_only_channels: int = 3, output_only_channels: int = 0, frames: int = 1, embed_dim: int = 96, depths: Tuple[int, Ellipsis] = (2, 2, 6, 2), num_heads: Tuple[int, Ellipsis] = (3, 6, 12, 24), mlp_ratio: float = 4.0, init_values: float | None = 0.0, drop_rate: float = 0.0, proj_drop_rate: float = 0.0, attn_drop_rate: float = 0.0, drop_path_rate: float = 0.0, norm_layer: Type[torch.nn.Module] = nn.LayerNorm, extra_norm_period: int = 0, extra_norm_stage: bool = False, sequential_attn: bool = False, global_pool: str = 'avg', weight_init='skip', full_pos_embed: bool = False, rel_pos: bool = True, checkpoint_stages: bool = False, residual: bool = False, use_spectral_norm: bool = False, padding_conf: dict = None, post_conf: dict = None, **kwargs: Any)#

Bases: credit.models.base_model.BaseModel

Swin Transformer V2
A PyTorch impl ofSwin Transformer V2: Scaling Up Capacity and Resolution -

https://arxiv.org/pdf/2111.09883

Parameters:
  • img_size – Input resolution.

  • window_size – Window size. If None, img_size // window_div

  • img_window_ratio – Window size to image size ratio.

  • patch_size – Patch size.

  • in_chans – Number of input channels.

  • depths – Depth of the stage (number of layers).

  • num_heads – Number of attention heads to be utilized.

  • embed_dim – Patch embedding dimension.

  • num_classes – Number of output classes.

  • mlp_ratio – Ratio of the hidden dimension in the FFN to the input channels.

  • drop_rate – Dropout rate.

  • proj_drop_rate – Projection dropout rate.

  • attn_drop_rate – Dropout rate of attention map.

  • drop_path_rate – Stochastic depth rate.

  • norm_layer – Type of normalization layer to be utilized.

  • extra_norm_period – Insert extra norm layer on main branch every N (period) blocks in stage

  • extra_norm_stage – End each stage with an extra norm layer in main branch

  • sequential_attn – If true sequential self-attention is performed.

  • padding_conf (dict) – padding configuration

  • post_conf (dict) – configuration for postblock processing

use_padding#
patch_size: int = 4#
img_size: Tuple[int, int] = (224, 224)#
window_size: int#
num_features: int = 96#
frames = 1#
in_chans = 70#
out_chans = 67#
feature_info = []#
full_pos_embed = False#
checkpoint_stages = False#
residual = False#
depth#
use_post_block#
patch_embed#
stages#
head#
use_spectral_norm = False#
forward_features(x: torch.Tensor) torch.Tensor#
forward_head(x: torch.Tensor) torch.Tensor#
forward(x: torch.Tensor) torch.Tensor#
update_input_size(new_img_size: Tuple[int, int] | None = None, new_window_size: int | None = None, img_window_ratio: int = 32) None#

Method updates the image resolution to be processed and window size and so the pair-wise relative positions.

Parameters:
  • new_window_size (Optional[int]) – New window size, if None based on new_img_size // window_div

  • new_img_size (Optional[Tuple[int, int]]) – New input resolution, if None current resolution is used

  • img_window_ratio (int) – divisor for calculating window size from image size

group_matcher(coarse=False)#
set_grad_checkpointing(enable=True)#
get_classifier() torch.nn.Module#

Method returns the classification head of the model. :returns: Current classification head :rtype: head (nn.Module)

reset_classifier(num_classes: int, global_pool: str | None = None) None#

Method results the classification head

Parameters:
  • num_classes (int) – Number of classes to be predicted

  • global_pool (str) – Unused

credit.models.swin.init_weights(module: torch.nn.Module, name: str = '')#
credit.models.swin.image_height = 640#