credit.attend
=============

.. py:module:: credit.attend


Attributes
----------

.. autoapisummary::

   credit.attend.print_once


Classes
-------

.. autoapisummary::

   credit.attend.AttentionConfig
   credit.attend.Attend


Functions
---------

.. autoapisummary::

   credit.attend.exists
   credit.attend.default
   credit.attend.once


Module Contents
---------------

.. py:class:: AttentionConfig

   Bases: :py:obj:`tuple`


   .. py:attribute:: enable_flash


   .. py:attribute:: enable_math


   .. py:attribute:: enable_mem_efficient


.. py:function:: exists(val)

.. py:function:: default(val, d)

.. py:function:: once(fn)

.. py:data:: print_once

.. py:class:: Attend(dropout=0.0, flash=False, scale=None)

   Bases: :py:obj:`torch.nn.Module`


   Base class for all neural network modules.

   Your models should also subclass this class.

   Modules can also contain other Modules, allowing them to be nested in
   a tree structure. You can assign the submodules as regular attributes::

       import torch.nn as nn
       import torch.nn.functional as F


       class Model(nn.Module):
           def __init__(self) -> None:
               super().__init__()
               self.conv1 = nn.Conv2d(1, 20, 5)
               self.conv2 = nn.Conv2d(20, 20, 5)

           def forward(self, x):
               x = F.relu(self.conv1(x))
               return F.relu(self.conv2(x))

   Submodules assigned in this way will be registered, and will also have their
   parameters converted when you call :meth:`to`, etc.

   .. note::
       As per the example above, an ``__init__()`` call to the parent class
       must be made before assignment on the child.

   :ivar training: Boolean represents whether this module is in training or
                   evaluation mode.
   :vartype training: bool


   .. py:attribute:: dropout
      :value: 0.0



   .. py:attribute:: scale
      :value: None



   .. py:attribute:: attn_dropout


   .. py:attribute:: flash
      :value: False



   .. py:attribute:: cpu_config


   .. py:attribute:: cuda_config
      :value: None



   .. py:method:: flash_attn(q, k, v)


   .. py:method:: forward(q, k, v)

      einstein notation
      b - batch
      h - heads
      n, i, j - sequence length (base sequence length, source, target)
      d - feature dimension



