<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>TensorPlay Blog</title>
    <link>https://blog.tensorplay.cn/</link>
    <description>Recent content on TensorPlay Blog</description>
    <generator>Hugo -- 0.157.0</generator>
    <language>en-us</language>
    <lastBuildDate>Tue, 10 Mar 2026 12:00:00 +0800</lastBuildDate><atom:link href="https://blog.tensorplay.cn/en/tags/linear-attention/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Talk Linear Attention</title>
      <link>https://blog.tensorplay.cn/en/posts/talk-linear-attention/</link>
      <pubDate>Tue, 10 Mar 2026 12:00:00 +0800</pubDate>
      <guid>https://blog.tensorplay.cn/en/posts/talk-linear-attention/</guid>
      <description>&lt;h2 id=&#34;introduction&#34;&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Since Vaswani et al. proposed the Transformer architecture in 2017, the attention mechanism based on Softmax Attention has become the core component of sequence modeling, supporting the rapid development of Large Language Models (LLMs). However, Softmax Attention inherently suffers from &lt;strong&gt;quadratic computational complexity&lt;/strong&gt;: as the sequence length $L$ increases, the computational and memory overhead of $O(L^2d)$ grows quadratically, which has become a core bottleneck for long-sequence modeling and efficient inference and training of LLMs.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>