Beyond attention [& transformers]
Hyena is a convolutional layer for LLMs that can shrink the gap with attention, while scaling *subquadratically* in seq len (eg train a lot faster @ 64k + train 100k+ tokens!) 2/
— Michael Poli (@MichaelPoli6) March 7, 2023
blogs: https://t.co/DIeS1kfyte, https://t.co/FE8BgZYzTX
code: https://t.co/ss9n5bxtDP pic.twitter.com/4yCzbJWlLJ
Abstract of the article linked above:
Recent advances in deep learning have relied heavily on the use of large Transformers due to their ability to learn at scale. However, the core building block of Transformers, the attention operator, exhibits quadratic cost in sequence length, limiting ...