Found this half page note I wrote ~6 years ago. Describes basically linear attention but half a year before the “Transformers are RNNs” paper came out. Sadly I didn’t take it too seriously at the time because I didn’t have any use cases for it and was also too busy with GANs.
From X
Disclaimer: The above content reflects only the author's opinion and does not represent any stance of CoinNX, nor does it constitute any investment advice related to CoinNX.


