credit.models.swin

credit.models.swin#

Attributes#

`logger`
`image_height`

Classes#

`WindowMultiHeadAttentionNoPos`	This class implements window-based Multi-Head-Attention with log-spaced continuous position bias.
`WindowMultiHeadAttention`	This class implements window-based Multi-Head-Attention with log-spaced continuous position bias.
`SwinTransformerV2CrBlock`	This class implements the Swin transformer block.
`PatchMerging`	This class implements the patch merging as a strided convolution with a normalization before.
`PatchEmbed`	2D Image to Patch Embedding
`SwinTransformerV2CrStage`	This class implements a stage of the Swin transformer including multiple layers.
`SwinTransformerV2Cr`	Swin Transformer V2

Functions#

`apply_spectral_norm`(model)
`circular_pad1d`(x, pad)
`bchw_to_bhwc`(→ torch.Tensor)	Permutes a tensor from the shape (B, C, H, W) to (B, H, W, C).
`bhwc_to_bchw`(→ torch.Tensor)	Permutes a tensor from the shape (B, H, W, C) to (B, C, H, W).
`swin_from_yaml`(fname[, checkpoint_stages])
`swinv2net`(params[, checkpoint_stages])
`window_partition`(x, window_size)
`window_reverse`(windows, window_size, img_size)
`init_weights`(module[, name])

Module Contents#

credit.models.swin.logger#

credit.models.swin.apply_spectral_norm(model)#

credit.models.swin.circular_pad1d(x, pad)#

credit.models.swin.bchw_to_bhwc(x: torch.Tensor) → torch.Tensor#: Permutes a tensor from the shape (B, C, H, W) to (B, H, W, C).

credit.models.swin.bhwc_to_bchw(x: torch.Tensor) → torch.Tensor#: Permutes a tensor from the shape (B, H, W, C) to (B, C, H, W).

credit.models.swin.swin_from_yaml(fname, checkpoint_stages=False)#

credit.models.swin.swinv2net(params, checkpoint_stages=False)#

credit.models.swin.window_partition(x, window_size: Tuple[int, int])#

Parameters:

x – (B, H, W, C)
window_size (int) – window size

Returns:

(num_windows*B, window_size, window_size, C)

Return type:

windows

credit.models.swin.window_reverse(windows, window_size: Tuple[int, int], img_size: Tuple[int, int])#

Parameters:

windows – (num_windows * B, window_size[0], window_size[1], C)
window_size (Tuple[int, int]) – Window size
img_size (Tuple[int, int]) – Image size

Returns:

(B, H, W, C)

Return type:

class credit.models.swin.WindowMultiHeadAttentionNoPos(dim: int, num_heads: int, window_size: Tuple[int, int], drop_attn: float = 0.0, drop_proj: float = 0.0, sequential_attn: bool = False)#

Bases: torch.nn.Module

This class implements window-based Multi-Head-Attention with log-spaced continuous position bias.

Parameters:

dim (int) – Number of input features
window_size (int) – Window size
num_heads (int) – Number of attention heads
drop_attn (float) – Dropout rate of attention map
drop_proj (float) – Dropout rate after projection
meta_hidden_dim (int) – Number of hidden features in the two layer MLP meta network
sequential_attn (bool) – If true sequential self-attention is performed

in_features: int#

window_size: Tuple[int, int]#

num_heads: int#

sequential_attn: bool = False#

qkv#

attn_drop#

proj#

proj_drop#

logit_scale#

update_input_size(new_window_size: int, **kwargs: Any) → None#

Method updates the window size and so the pair-wise relative positions

Parameters:

new_window_size (int) – New window size
kwargs (Any) – Unused

forward(x: torch.Tensor, mask: torch.Tensor | None = None) → torch.Tensor#

Forward pass. :param x: Input tensor of the shape (B * windows, N, C) :type x: torch.Tensor :param mask: Attention mask for the shift case :type mask: Optional[torch.Tensor]

Returns:: Output tensor of the shape [B * windows, N, C]

class credit.models.swin.WindowMultiHeadAttention(dim: int, num_heads: int, window_size: Tuple[int, int], drop_attn: float = 0.0, drop_proj: float = 0.0, meta_hidden_dim: int = 384, sequential_attn: bool = False)#

Bases: torch.nn.Module

This class implements window-based Multi-Head-Attention with log-spaced continuous position bias.

Parameters:

dim (int) – Number of input features
window_size (int) – Window size
num_heads (int) – Number of attention heads
drop_attn (float) – Dropout rate of attention map
drop_proj (float) – Dropout rate after projection
meta_hidden_dim (int) – Number of hidden features in the two layer MLP meta network
sequential_attn (bool) – If true sequential self-attention is performed

in_features: int#

window_size: Tuple[int, int]#

num_heads: int#

sequential_attn: bool = False#

qkv#

attn_drop#

proj#

proj_drop#

meta_mlp#

logit_scale#

_make_pair_wise_relative_positions() → None#: Method initializes the pair-wise relative positions to compute the positional biases.

update_input_size(new_window_size: int, **kwargs: Any) → None#

Method updates the window size and so the pair-wise relative positions

Parameters:

new_window_size (int) – New window size
kwargs (Any) – Unused

_relative_positional_encodings() → torch.Tensor#

Method computes the relative positional encodings

Returns:: Relative positional encodings (1, number of heads, window size ** 2, window size ** 2)
Return type:: relative_position_bias (torch.Tensor)

forward(x: torch.Tensor, mask: torch.Tensor | None = None) → torch.Tensor#

Forward pass. :param x: Input tensor of the shape (B * windows, N, C) :type x: torch.Tensor :param mask: Attention mask for the shift case :type mask: Optional[torch.Tensor]

Returns:: Output tensor of the shape [B * windows, N, C]

class credit.models.swin.SwinTransformerV2CrBlock(dim: int, num_heads: int, feat_size: Tuple[int, int], window_size: Tuple[int, int], shift_size: Tuple[int, int] = (0, 0), mlp_ratio: float = 4.0, init_values: float | None = 0, proj_drop: float = 0.0, drop_attn: float = 0.0, drop_path: float = 0.0, extra_norm: bool = False, sequential_attn: bool = False, norm_layer: Type[torch.nn.Module] = nn.LayerNorm, rel_pos: bool = True)#

Bases: torch.nn.Module

This class implements the Swin transformer block.

Parameters:

dim (int) – Number of input channels
num_heads (int) – Number of attention heads to be utilized
feat_size (Tuple[int, int]) – Input resolution
window_size (Tuple[int, int]) – Window size to be utilized
shift_size (int) – Shifting size to be used
mlp_ratio (int) – Ratio of the hidden dimension in the FFN to the input channels
proj_drop (float) – Dropout in input mapping
drop_attn (float) – Dropout rate of attention map
drop_path (float) – Dropout in main path
extra_norm (bool) – Insert extra norm on ‘main’ branch if True
sequential_attn (bool) – If true sequential self-attention is performed
norm_layer (Type[nn.Module]) – Type of normalization layer to be utilized

dim: int#

feat_size: Tuple[int, int]#

target_shift_size: Tuple[int, int] = (0, 0)#

window_area#

init_values: float | None = 0#

attn#

norm1#

drop_path1#

mlp#

norm2#

drop_path2#

norm3#

_calc_window_shift(target_window_size)#

_make_attention_mask() → None#: Method generates the attention mask used in shift case.

init_weights()#

update_input_size(new_window_size: Tuple[int, int], new_feat_size: Tuple[int, int]) → None#

Method updates the image resolution to be processed and window size and so the pair-wise relative positions.

Parameters:

new_window_size (int) – New window size
new_feat_size (Tuple[int, int]) – New input resolution

_shifted_window_attn(x)#

forward(x: torch.Tensor) → torch.Tensor#

Forward pass.

Parameters:: x (torch.Tensor) – Input tensor of the shape [B, C, H, W]
Returns:: Output tensor of the shape [B, C, H, W]
Return type:: output (torch.Tensor)

class credit.models.swin.PatchMerging(dim: int, norm_layer: Type[torch.nn.Module] = nn.LayerNorm)#

Bases: torch.nn.Module

This class implements the patch merging as a strided convolution with a normalization before. :param dim: Number of input channels :type dim: int :param norm_layer: Type of normalization layer to be utilized. :type norm_layer: Type[nn.Module]

norm#

reduction#

forward(x: torch.Tensor) → torch.Tensor#

Forward pass. :param x: Input tensor of the shape [B, C, H, W] :type x: torch.Tensor

Returns:: Output tensor of the shape [B, 2 * C, H // 2, W // 2]
Return type:: output (torch.Tensor)

class credit.models.swin.PatchEmbed(img_size=224, patch_size=16, in_chans=3, embed_dim=768, norm_layer=None)#

Bases: torch.nn.Module

2D Image to Patch Embedding

img_size#

patch_size#

grid_size#

num_patches#

proj#

norm#

forward(x)#

class credit.models.swin.SwinTransformerV2CrStage(embed_dim: int, depth: int, downscale: bool, num_heads: int, feat_size: Tuple[int, int], window_size: Tuple[int, int], mlp_ratio: float = 4.0, init_values: float | None = 0.0, proj_drop: float = 0.0, drop_attn: float = 0.0, drop_path: List[float] | float = 0.0, norm_layer: Type[torch.nn.Module] = nn.LayerNorm, extra_norm_period: int = 0, extra_norm_stage: bool = False, sequential_attn: bool = False, rel_pos: bool = True, grad_checkpointing: bool = False)#

Bases: torch.nn.Module

This class implements a stage of the Swin transformer including multiple layers.

Parameters:

embed_dim (int) – Number of input channels
depth (int) – Depth of the stage (number of layers)
downscale (bool) – If true input is downsampled (see Fig. 3 or V1 paper)
feat_size (Tuple[int, int]) – input feature map size (H, W)
num_heads (int) – Number of attention heads to be utilized
window_size (int) – Window size to be utilized
mlp_ratio (int) – Ratio of the hidden dimension in the FFN to the input channels
proj_drop (float) – Dropout in input mapping
drop_attn (float) – Dropout rate of attention map
drop_path (float) – Dropout in main path
norm_layer (Type[nn.Module]) – Type of normalization layer to be utilized. Default: nn.LayerNorm
extra_norm_period (int) – Insert extra norm layer on main branch every N (period) blocks
extra_norm_stage (bool) – End each stage with an extra norm layer in main branch
sequential_attn (bool) – If true sequential self-attention is performed

downscale: bool#

feat_size: Tuple[int, int]#

grad_checkpointing = False#

blocks#

update_input_size(new_window_size: int, new_feat_size: Tuple[int, int]) → None#

Method updates the resolution to utilize and the window size and so the pair-wise relative positions.

Parameters:

new_window_size (int) – New window size
new_feat_size (Tuple[int, int]) – New input resolution

forward(x: torch.Tensor) → torch.Tensor#

Forward pass. :param x: Input tensor of the shape [B, C, H, W] or [B, L, C] :type x: torch.Tensor

Returns:: Output tensor of the shape [B, 2 * C, H // 2, W // 2]
Return type:: output (torch.Tensor)

class credit.models.swin.SwinTransformerV2Cr(img_size: Tuple[int, int] = (224, 224), patch_size: int = 4, window_size: int | None = None, img_window_ratio: int = 32, channels: int = 4, levels: int = 15, surface_channels: int = 7, input_only_channels: int = 3, output_only_channels: int = 0, frames: int = 1, embed_dim: int = 96, depths: Tuple[int, Ellipsis] = (2, 2, 6, 2), num_heads: Tuple[int, Ellipsis] = (3, 6, 12, 24), mlp_ratio: float = 4.0, init_values: float | None = 0.0, drop_rate: float = 0.0, proj_drop_rate: float = 0.0, attn_drop_rate: float = 0.0, drop_path_rate: float = 0.0, norm_layer: Type[torch.nn.Module] = nn.LayerNorm, extra_norm_period: int = 0, extra_norm_stage: bool = False, sequential_attn: bool = False, global_pool: str = 'avg', weight_init='skip', full_pos_embed: bool = False, rel_pos: bool = True, checkpoint_stages: bool = False, residual: bool = False, use_spectral_norm: bool = False, padding_conf: dict = None, post_conf: dict = None, **kwargs: Any)#

Bases: credit.models.base_model.BaseModel

Swin Transformer V2

A PyTorch impl ofSwin Transformer V2: Scaling Up Capacity and Resolution -: https://arxiv.org/pdf/2111.09883

Parameters:

img_size – Input resolution.
window_size – Window size. If None, img_size // window_div
img_window_ratio – Window size to image size ratio.
patch_size – Patch size.
in_chans – Number of input channels.
depths – Depth of the stage (number of layers).
num_heads – Number of attention heads to be utilized.
embed_dim – Patch embedding dimension.
num_classes – Number of output classes.
mlp_ratio – Ratio of the hidden dimension in the FFN to the input channels.
drop_rate – Dropout rate.
proj_drop_rate – Projection dropout rate.
attn_drop_rate – Dropout rate of attention map.
drop_path_rate – Stochastic depth rate.
norm_layer – Type of normalization layer to be utilized.
extra_norm_period – Insert extra norm layer on main branch every N (period) blocks in stage
extra_norm_stage – End each stage with an extra norm layer in main branch
sequential_attn – If true sequential self-attention is performed.
padding_conf (dict) – padding configuration
post_conf (dict) – configuration for postblock processing

use_padding#

patch_size: int = 4#

img_size: Tuple[int, int] = (224, 224)#

window_size: int#

num_features: int = 96#

frames = 1#

in_chans = 70#

out_chans = 67#

feature_info = []#

full_pos_embed = False#

checkpoint_stages = False#

residual = False#

depth#

use_post_block#

patch_embed#

stages#

head#

use_spectral_norm = False#

forward_features(x: torch.Tensor) → torch.Tensor#

forward_head(x: torch.Tensor) → torch.Tensor#

forward(x: torch.Tensor) → torch.Tensor#

update_input_size(new_img_size: Tuple[int, int] | None = None, new_window_size: int | None = None, img_window_ratio: int = 32) → None#

Method updates the image resolution to be processed and window size and so the pair-wise relative positions.

Parameters:

new_window_size (Optional[int]) – New window size, if None based on new_img_size // window_div
new_img_size (Optional[Tuple[int, int]]) – New input resolution, if None current resolution is used
img_window_ratio (int) – divisor for calculating window size from image size

group_matcher(coarse=False)#

set_grad_checkpointing(enable=True)#

get_classifier() → torch.nn.Module#: Method returns the classification head of the model. :returns: Current classification head :rtype: head (nn.Module)

reset_classifier(num_classes: int, global_pool: str | None = None) → None#

Method results the classification head

Parameters:

num_classes (int) – Number of classes to be predicted
global_pool (str) – Unused

credit.models.swin.init_weights(module: torch.nn.Module, name: str = '')#

credit.models.swin.image_height = 640#