$114759.887876 USD

1.15%

ethereum

$3567.012478 USD

3.57%

xrp

$3.011311 USD

6.59%

tether

$1.000079 USD

0.03%

bnb

$755.879920 USD

1.64%

solana

$164.217689 USD

2.21%

usd-coin

$0.999865 USD

-0.01%

tron

$0.327478 USD

1.21%

dogecoin

$0.202566 USD

3.38%

cardano

$0.738623 USD

3.60%

hyperliquid

$38.685825 USD

3.16%

stellar

$0.412969 USD

10.27%

sui

$3.496145 USD

2.58%

chainlink

$16.602360 USD

4.54%

bitcoin-cash

$550.336635 USD

4.06%

Cryptocurrency News Articles

Low-Rank Sparse Attention (Lorsa) Disentangles Atomic Attention Units

May 08, 2025 at 02:07 am

Large Language Models (LLMs) have gained significant attention in recent years, yet understanding their internal mechanisms remains challenging.

Large Language Models (LLMs) have recently come into the spotlight, yet comprehending their internal mechanisms remains a challenge. When examining individual attention heads in Transformer models, researchers have identified specific functionalities in some heads. For instance, researchers have discovered induction heads in the Pythia model that predict tokens like ‘Potter’ following ‘Harry’ when the phrase appears in context, and ablation studies confirm these heads’ causal relationship to model behaviours. However, most attention heads distribute focus across diverse contexts without clear functionality.

The challenge lies in interpreting these complex attention patterns, as inter-head collaboration occurs rather than isolated functionality. This phenomenon is similar to how neurons in the brain can encode multiple features in a low-dimensional space, leading to feature superposition. The research proposes an overcomplete sparse attention architecture, termed Low-Rank Sparse Attention (Lorsa), to decompose attention superposition in Multi-Head Self-Attention (MHSA) mechanisms, taking inspiration from Sparse Autoencoders (SAEs) that extract overcomplete sets of sparse, linearly comprehensible features from neural networks.

Attention superposition arises from the hypothesis that MHSA comprises multiple attention units in superposition, each attending between specific token pairs with interpretable read/write operations on the residual stream. This hypothesis suggests atomic attention units might be spread across multiple MHSA heads, while individual heads contain a few attention units.

Three key pieces of evidence support attention superposition: First, polysemantic heads respond to unrelated inputs, like successor heads that increment days, numbers, and exhibit acronym/copying behaviours simultaneously. Second, most attention heads lack clear interpretation patterns, with studies showing failed interpretation attempts for over 90% of GPT-2 heads. Third, direct observations show attention output features collectively contributed by multiple heads, with approximately 25% of learned attention units being spanned by multiple MHSA heads.

This lack of interpretability is a major hurdle in attributing model behavior to specific internal circuits. The structure of attention superposition may hold the key to understanding this biological motif, as it raises the question of why certain attention units, like induction heads, are implemented by single MHSA heads while others exist in superposition.

To address this, Lorsa is trained to predict MHSA outputs by minimizing mean square error. It employs one-dimensional OV circuits that restrict read/write operations to specific residual stream features, aligning with the linear representation hypothesis. For Query and Key weights, Lorsa implements parameter sharing across every DLorsa QK head, maintaining parameter efficiency while preserving performance. This strategy makes Lorsa QK circuits similar to MHSA but with sparsity constraints on each OV dimension.

Lorsa employs orders of magnitude more heads than standard MH. For each position, Lorsa’s output aggregates only the top-K heads with the largest activation values, with the active head subset varying dynamically across token positions. This approach is similar to TopK-SAEs, selecting the most salient linear components. However, Lorsa’s head activations derive from attention patterns of previous tokens rather than simple linear encoders with ReLU.

Lorsa’s interpretability assessment uses several key metrics to understand individual head functionality. Top activations help identify patterns by examining the 16 highest-activating tokens for each Lorsa head across 100 million samples from held-out data. The z pattern analysis decomposes activations linearly into token-wise contributions from preceding positions, revealing which previous tokens contribute to current activations. This approach parallels direct feature attribution analysis used for attention Sparse Autoencoders, but with simpler attribution involving just one one-dimensional OV circuit and a single QK circuit.

A visualisation dashboard provides comprehensive information about each Lorsa head. For example, a “you”-specific induction head shows several important patterns: it primarily reads from features indicating the current token is “you”/”your” through its weight vector, strongly activates a “say you” feature that amplifies the logit of “you,” and increases prediction probabilities for various “you” tokens. The QK attention pattern computation involves current token features at the query position and previous token features where the current token is “you,” with the previous token often being words like “with,” “thank,” or “do.” Interestingly, this particular Lorsa head is almost equally distributed between two MHSA heads (5.0 and 5.7), demonstrating how Lorsa successfully disentangles attention units that exist across multiple standard attention heads.

The research, conducted by the Shanghai Innovation Institute, OpenMOSS Team, and Fudan University, evaluated Lorsa on both Pythia-160M and Llama-3.1-8B models. Using an exploration interface and a visualization dashboard, they quantitatively assessed Lorsa’s interpretability through top activations and attribution patterns.

The results showed that Lorsa's monosemanticity compares favorably to Sparse Autoencoder features. In Pythia-160M, Lorsa successfully identified known attention mechanisms such as induction heads, name mover heads, successor heads, and attention sinks, which were previously discovered by researchers using techniques like activation patching

Original source：marktechpost

Disclaimer:info@kdj.com

The information provided is not trading advice. kdj.com does not assume any responsibility for any investments made based on the information provided in this article. Cryptocurrencies are highly volatile and it is highly recommended that you invest with caution after thorough research！

If you believe that the content used on this website infringes your copyright, please contact us immediately (info@kdj.com) and we will delete it promptly.

Other articles published on Aug 04, 2025