credit.models.swin
==================

.. py:module:: credit.models.swin


Attributes
----------

.. autoapisummary::

   credit.models.swin.logger
   credit.models.swin.image_height


Classes
-------

.. autoapisummary::

   credit.models.swin.WindowMultiHeadAttentionNoPos
   credit.models.swin.WindowMultiHeadAttention
   credit.models.swin.SwinTransformerV2CrBlock
   credit.models.swin.PatchMerging
   credit.models.swin.PatchEmbed
   credit.models.swin.SwinTransformerV2CrStage
   credit.models.swin.SwinTransformerV2Cr


Functions
---------

.. autoapisummary::

   credit.models.swin.apply_spectral_norm
   credit.models.swin.circular_pad1d
   credit.models.swin.bchw_to_bhwc
   credit.models.swin.bhwc_to_bchw
   credit.models.swin.swin_from_yaml
   credit.models.swin.swinv2net
   credit.models.swin.window_partition
   credit.models.swin.window_reverse
   credit.models.swin.init_weights


Module Contents
---------------

.. py:data:: logger

.. py:function:: apply_spectral_norm(model)

.. py:function:: circular_pad1d(x, pad)

.. py:function:: bchw_to_bhwc(x: torch.Tensor) -> torch.Tensor

   Permutes a tensor from the shape (B, C, H, W) to (B, H, W, C).


.. py:function:: bhwc_to_bchw(x: torch.Tensor) -> torch.Tensor

   Permutes a tensor from the shape (B, H, W, C) to (B, C, H, W).


.. py:function:: swin_from_yaml(fname, checkpoint_stages=False)

.. py:function:: swinv2net(params, checkpoint_stages=False)

.. py:function:: window_partition(x, window_size: Tuple[int, int])

   :param x: (B, H, W, C)
   :param window_size: window size
   :type window_size: int

   :returns: (num_windows*B, window_size, window_size, C)
   :rtype: windows


.. py:function:: window_reverse(windows, window_size: Tuple[int, int], img_size: Tuple[int, int])

   :param windows: (num_windows * B, window_size[0], window_size[1], C)
   :param window_size: Window size
   :type window_size: Tuple[int, int]
   :param img_size: Image size
   :type img_size: Tuple[int, int]

   :returns: (B, H, W, C)
   :rtype: x


.. py:class:: WindowMultiHeadAttentionNoPos(dim: int, num_heads: int, window_size: Tuple[int, int], drop_attn: float = 0.0, drop_proj: float = 0.0, sequential_attn: bool = False)

   Bases: :py:obj:`torch.nn.Module`


   This class implements window-based Multi-Head-Attention with log-spaced continuous position bias.

   :param dim: Number of input features
   :type dim: int
   :param window_size: Window size
   :type window_size: int
   :param num_heads: Number of attention heads
   :type num_heads: int
   :param drop_attn: Dropout rate of attention map
   :type drop_attn: float
   :param drop_proj: Dropout rate after projection
   :type drop_proj: float
   :param meta_hidden_dim: Number of hidden features in the two layer MLP meta network
   :type meta_hidden_dim: int
   :param sequential_attn: If true sequential self-attention is performed
   :type sequential_attn: bool


   .. py:attribute:: in_features
      :type:  int


   .. py:attribute:: window_size
      :type:  Tuple[int, int]


   .. py:attribute:: num_heads
      :type:  int


   .. py:attribute:: sequential_attn
      :type:  bool
      :value: False


   .. py:attribute:: qkv


   .. py:attribute:: attn_drop


   .. py:attribute:: proj


   .. py:attribute:: proj_drop


   .. py:attribute:: logit_scale


   .. py:method:: update_input_size(new_window_size: int, **kwargs: Any) -> None

      Method updates the window size and so the pair-wise relative positions

      :param new_window_size: New window size
      :type new_window_size: int
      :param kwargs: Unused
      :type kwargs: Any


   .. py:method:: forward(x: torch.Tensor, mask: Optional[torch.Tensor] = None) -> torch.Tensor

      Forward pass.
      :param x: Input tensor of the shape (B * windows, N, C)
      :type x: torch.Tensor
      :param mask: Attention mask for the shift case
      :type mask: Optional[torch.Tensor]

      :returns: Output tensor of the shape [B * windows, N, C]


.. py:class:: WindowMultiHeadAttention(dim: int, num_heads: int, window_size: Tuple[int, int], drop_attn: float = 0.0, drop_proj: float = 0.0, meta_hidden_dim: int = 384, sequential_attn: bool = False)

   Bases: :py:obj:`torch.nn.Module`


   This class implements window-based Multi-Head-Attention with log-spaced continuous position bias.

   :param dim: Number of input features
   :type dim: int
   :param window_size: Window size
   :type window_size: int
   :param num_heads: Number of attention heads
   :type num_heads: int
   :param drop_attn: Dropout rate of attention map
   :type drop_attn: float
   :param drop_proj: Dropout rate after projection
   :type drop_proj: float
   :param meta_hidden_dim: Number of hidden features in the two layer MLP meta network
   :type meta_hidden_dim: int
   :param sequential_attn: If true sequential self-attention is performed
   :type sequential_attn: bool


   .. py:attribute:: in_features
      :type:  int


   .. py:attribute:: window_size
      :type:  Tuple[int, int]


   .. py:attribute:: num_heads
      :type:  int


   .. py:attribute:: sequential_attn
      :type:  bool
      :value: False


   .. py:attribute:: qkv


   .. py:attribute:: attn_drop


   .. py:attribute:: proj


   .. py:attribute:: proj_drop


   .. py:attribute:: meta_mlp


   .. py:attribute:: logit_scale


   .. py:method:: _make_pair_wise_relative_positions() -> None

      Method initializes the pair-wise relative positions to compute the positional biases.


   .. py:method:: update_input_size(new_window_size: int, **kwargs: Any) -> None

      Method updates the window size and so the pair-wise relative positions

      :param new_window_size: New window size
      :type new_window_size: int
      :param kwargs: Unused
      :type kwargs: Any


   .. py:method:: _relative_positional_encodings() -> torch.Tensor

      Method computes the relative positional encodings

      :returns: Relative positional encodings
                (1, number of heads, window size ** 2, window size ** 2)
      :rtype: relative_position_bias (torch.Tensor)


   .. py:method:: forward(x: torch.Tensor, mask: Optional[torch.Tensor] = None) -> torch.Tensor

      Forward pass.
      :param x: Input tensor of the shape (B * windows, N, C)
      :type x: torch.Tensor
      :param mask: Attention mask for the shift case
      :type mask: Optional[torch.Tensor]

      :returns: Output tensor of the shape [B * windows, N, C]


.. py:class:: SwinTransformerV2CrBlock(dim: int, num_heads: int, feat_size: Tuple[int, int], window_size: Tuple[int, int], shift_size: Tuple[int, int] = (0, 0), mlp_ratio: float = 4.0, init_values: Optional[float] = 0, proj_drop: float = 0.0, drop_attn: float = 0.0, drop_path: float = 0.0, extra_norm: bool = False, sequential_attn: bool = False, norm_layer: Type[torch.nn.Module] = nn.LayerNorm, rel_pos: bool = True)

   Bases: :py:obj:`torch.nn.Module`


   This class implements the Swin transformer block.

   :param dim: Number of input channels
   :type dim: int
   :param num_heads: Number of attention heads to be utilized
   :type num_heads: int
   :param feat_size: Input resolution
   :type feat_size: Tuple[int, int]
   :param window_size: Window size to be utilized
   :type window_size: Tuple[int, int]
   :param shift_size: Shifting size to be used
   :type shift_size: int
   :param mlp_ratio: Ratio of the hidden dimension in the FFN to the input channels
   :type mlp_ratio: int
   :param proj_drop: Dropout in input mapping
   :type proj_drop: float
   :param drop_attn: Dropout rate of attention map
   :type drop_attn: float
   :param drop_path: Dropout in main path
   :type drop_path: float
   :param extra_norm: Insert extra norm on 'main' branch if True
   :type extra_norm: bool
   :param sequential_attn: If true sequential self-attention is performed
   :type sequential_attn: bool
   :param norm_layer: Type of normalization layer to be utilized
   :type norm_layer: Type[nn.Module]


   .. py:attribute:: dim
      :type:  int


   .. py:attribute:: feat_size
      :type:  Tuple[int, int]


   .. py:attribute:: target_shift_size
      :type:  Tuple[int, int]
      :value: (0, 0)


   .. py:attribute:: window_area


   .. py:attribute:: init_values
      :type:  Optional[float]
      :value: 0


   .. py:attribute:: attn


   .. py:attribute:: norm1


   .. py:attribute:: drop_path1


   .. py:attribute:: mlp


   .. py:attribute:: norm2


   .. py:attribute:: drop_path2


   .. py:attribute:: norm3


   .. py:method:: _calc_window_shift(target_window_size)


   .. py:method:: _make_attention_mask() -> None

      Method generates the attention mask used in shift case.


   .. py:method:: init_weights()


   .. py:method:: update_input_size(new_window_size: Tuple[int, int], new_feat_size: Tuple[int, int]) -> None

      Method updates the image resolution to be processed and window size and so the pair-wise relative positions.

      :param new_window_size: New window size
      :type new_window_size: int
      :param new_feat_size: New input resolution
      :type new_feat_size: Tuple[int, int]


   .. py:method:: _shifted_window_attn(x)


   .. py:method:: forward(x: torch.Tensor) -> torch.Tensor

      Forward pass.

      :param x: Input tensor of the shape [B, C, H, W]
      :type x: torch.Tensor

      :returns: Output tensor of the shape [B, C, H, W]
      :rtype: output (torch.Tensor)


.. py:class:: PatchMerging(dim: int, norm_layer: Type[torch.nn.Module] = nn.LayerNorm)

   Bases: :py:obj:`torch.nn.Module`


   This class implements the patch merging as a strided convolution with a normalization before.
   :param dim: Number of input channels
   :type dim: int
   :param norm_layer: Type of normalization layer to be utilized.
   :type norm_layer: Type[nn.Module]


   .. py:attribute:: norm


   .. py:attribute:: reduction


   .. py:method:: forward(x: torch.Tensor) -> torch.Tensor

      Forward pass.
      :param x: Input tensor of the shape [B, C, H, W]
      :type x: torch.Tensor

      :returns: Output tensor of the shape [B, 2 * C, H // 2, W // 2]
      :rtype: output (torch.Tensor)


.. py:class:: PatchEmbed(img_size=224, patch_size=16, in_chans=3, embed_dim=768, norm_layer=None)

   Bases: :py:obj:`torch.nn.Module`


   2D Image to Patch Embedding


   .. py:attribute:: img_size


   .. py:attribute:: patch_size


   .. py:attribute:: grid_size


   .. py:attribute:: num_patches


   .. py:attribute:: proj


   .. py:attribute:: norm


   .. py:method:: forward(x)


.. py:class:: SwinTransformerV2CrStage(embed_dim: int, depth: int, downscale: bool, num_heads: int, feat_size: Tuple[int, int], window_size: Tuple[int, int], mlp_ratio: float = 4.0, init_values: Optional[float] = 0.0, proj_drop: float = 0.0, drop_attn: float = 0.0, drop_path: Union[List[float], float] = 0.0, norm_layer: Type[torch.nn.Module] = nn.LayerNorm, extra_norm_period: int = 0, extra_norm_stage: bool = False, sequential_attn: bool = False, rel_pos: bool = True, grad_checkpointing: bool = False)

   Bases: :py:obj:`torch.nn.Module`


   This class implements a stage of the Swin transformer including multiple layers.

   :param embed_dim: Number of input channels
   :type embed_dim: int
   :param depth: Depth of the stage (number of layers)
   :type depth: int
   :param downscale: If true input is downsampled (see Fig. 3 or V1 paper)
   :type downscale: bool
   :param feat_size: input feature map size (H, W)
   :type feat_size: Tuple[int, int]
   :param num_heads: Number of attention heads to be utilized
   :type num_heads: int
   :param window_size: Window size to be utilized
   :type window_size: int
   :param mlp_ratio: Ratio of the hidden dimension in the FFN to the input channels
   :type mlp_ratio: int
   :param proj_drop: Dropout in input mapping
   :type proj_drop: float
   :param drop_attn: Dropout rate of attention map
   :type drop_attn: float
   :param drop_path: Dropout in main path
   :type drop_path: float
   :param norm_layer: Type of normalization layer to be utilized. Default: nn.LayerNorm
   :type norm_layer: Type[nn.Module]
   :param extra_norm_period: Insert extra norm layer on main branch every N (period) blocks
   :type extra_norm_period: int
   :param extra_norm_stage: End each stage with an extra norm layer in main branch
   :type extra_norm_stage: bool
   :param sequential_attn: If true sequential self-attention is performed
   :type sequential_attn: bool


   .. py:attribute:: downscale
      :type:  bool


   .. py:attribute:: feat_size
      :type:  Tuple[int, int]


   .. py:attribute:: grad_checkpointing
      :value: False


   .. py:attribute:: blocks


   .. py:method:: update_input_size(new_window_size: int, new_feat_size: Tuple[int, int]) -> None

      Method updates the resolution to utilize and the window size and so the pair-wise relative positions.

      :param new_window_size: New window size
      :type new_window_size: int
      :param new_feat_size: New input resolution
      :type new_feat_size: Tuple[int, int]


   .. py:method:: forward(x: torch.Tensor) -> torch.Tensor

      Forward pass.
      :param x: Input tensor of the shape [B, C, H, W] or [B, L, C]
      :type x: torch.Tensor

      :returns: Output tensor of the shape [B, 2 * C, H // 2, W // 2]
      :rtype: output (torch.Tensor)


.. py:class:: SwinTransformerV2Cr(img_size: Tuple[int, int] = (224, 224), patch_size: int = 4, window_size: Optional[int] = None, img_window_ratio: int = 32, channels: int = 4, levels: int = 15, surface_channels: int = 7, input_only_channels: int = 3, output_only_channels: int = 0, frames: int = 1, embed_dim: int = 96, depths: Tuple[int, Ellipsis] = (2, 2, 6, 2), num_heads: Tuple[int, Ellipsis] = (3, 6, 12, 24), mlp_ratio: float = 4.0, init_values: Optional[float] = 0.0, drop_rate: float = 0.0, proj_drop_rate: float = 0.0, attn_drop_rate: float = 0.0, drop_path_rate: float = 0.0, norm_layer: Type[torch.nn.Module] = nn.LayerNorm, extra_norm_period: int = 0, extra_norm_stage: bool = False, sequential_attn: bool = False, global_pool: str = 'avg', weight_init='skip', full_pos_embed: bool = False, rel_pos: bool = True, checkpoint_stages: bool = False, residual: bool = False, use_spectral_norm: bool = False, padding_conf: dict = None, post_conf: dict = None, **kwargs: Any)

   Bases: :py:obj:`credit.models.base_model.BaseModel`


   Swin Transformer V2
       A PyTorch impl of : `Swin Transformer V2: Scaling Up Capacity and Resolution`  -
         https://arxiv.org/pdf/2111.09883

   :param img_size: Input resolution.
   :param window_size: Window size. If None, img_size // window_div
   :param img_window_ratio: Window size to image size ratio.
   :param patch_size: Patch size.
   :param in_chans: Number of input channels.
   :param depths: Depth of the stage (number of layers).
   :param num_heads: Number of attention heads to be utilized.
   :param embed_dim: Patch embedding dimension.
   :param num_classes: Number of output classes.
   :param mlp_ratio: Ratio of the hidden dimension in the FFN to the input channels.
   :param drop_rate: Dropout rate.
   :param proj_drop_rate: Projection dropout rate.
   :param attn_drop_rate: Dropout rate of attention map.
   :param drop_path_rate: Stochastic depth rate.
   :param norm_layer: Type of normalization layer to be utilized.
   :param extra_norm_period: Insert extra norm layer on main branch every N (period) blocks in stage
   :param extra_norm_stage: End each stage with an extra norm layer in main branch
   :param sequential_attn: If true sequential self-attention is performed.
   :param padding_conf: padding configuration
   :type padding_conf: dict
   :param post_conf: configuration for postblock processing
   :type post_conf: dict


   .. py:attribute:: use_padding


   .. py:attribute:: patch_size
      :type:  int
      :value: 4


   .. py:attribute:: img_size
      :type:  Tuple[int, int]
      :value: (224, 224)


   .. py:attribute:: window_size
      :type:  int


   .. py:attribute:: num_features
      :type:  int
      :value: 96


   .. py:attribute:: frames
      :value: 1


   .. py:attribute:: in_chans
      :value: 70


   .. py:attribute:: out_chans
      :value: 67


   .. py:attribute:: feature_info
      :value: []


   .. py:attribute:: full_pos_embed
      :value: False


   .. py:attribute:: checkpoint_stages
      :value: False


   .. py:attribute:: residual
      :value: False


   .. py:attribute:: depth


   .. py:attribute:: use_post_block


   .. py:attribute:: patch_embed


   .. py:attribute:: stages


   .. py:attribute:: head


   .. py:attribute:: use_spectral_norm
      :value: False


   .. py:method:: forward_features(x: torch.Tensor) -> torch.Tensor


   .. py:method:: forward_head(x: torch.Tensor) -> torch.Tensor


   .. py:method:: forward(x: torch.Tensor) -> torch.Tensor


   .. py:method:: update_input_size(new_img_size: Optional[Tuple[int, int]] = None, new_window_size: Optional[int] = None, img_window_ratio: int = 32) -> None

      Method updates the image resolution to be processed and window size and so the pair-wise relative positions.

      :param new_window_size: New window size, if None based on new_img_size // window_div
      :type new_window_size: Optional[int]
      :param new_img_size: New input resolution, if None current resolution is used
      :type new_img_size: Optional[Tuple[int, int]]
      :param img_window_ratio: divisor for calculating window size from image size
      :type img_window_ratio: int


   .. py:method:: group_matcher(coarse=False)


   .. py:method:: set_grad_checkpointing(enable=True)


   .. py:method:: get_classifier() -> torch.nn.Module

      Method returns the classification head of the model.
      :returns: Current classification head
      :rtype: head (nn.Module)


   .. py:method:: reset_classifier(num_classes: int, global_pool: Optional[str] = None) -> None

      Method results the classification head

      :param num_classes: Number of classes to be predicted
      :type num_classes: int
      :param global_pool: Unused
      :type global_pool: str


.. py:function:: init_weights(module: torch.nn.Module, name: str = '')

.. py:data:: image_height
   :value: 640