Talk Linear Attention

Tue, 10 Mar 2026 12:00:00 +0800

Introduction

Since Vaswani et al. proposed the Transformer architecture in 2017, the attention mechanism based on Softmax Attention has become the core component of sequence modeling, supporting the rapid development of Large Language Models (LLMs). However, Softmax Attention inherently suffers from quadratic computational complexity: as the sequence length $L$ increases, the computational and memory overhead of $O(L^2d)$ grows quadratically, which has become a core bottleneck for long-sequence modeling and efficient inference and training of LLMs.

TensorPlay Blog

Talk Linear Attention

Introduction