![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
Articles d’actualité sur les crypto-monnaies
RWKV-X: Linear-Time Long-Context Language Model
May 06, 2025 at 02:09 am
LLMs built on Transformer architectures face significant scaling challenges due to their quadratic complexity in sequence length when processing long-context inputs. Linear Attention models, State Space Models like Mamba, Linear RNNs like DeltaNet, and RWKV solve this problem. However, these linear architectures struggle with long-context understanding. For instance, RWKV-7 (2.9B) achieves high accuracy on passkey retrieval up to 28K tokens but experiences rapid performance degradation beyond this point. Even with continual pretraining using 128K-length data, long-context limitations persist. This issue extends beyond RWKV to other architectures like Mamba, presenting a fundamental challenge for this class of models.
Linear complexity language models are emerging as alternatives to transformer-based architectures, which suffer from quadratic computational demands when processing long sequences. The RWKV model series combines Transformer parallelizability during training with RNN-like recurrent state representation. RWKV has evolved through multiple iterations, starting with the foundational RWKV-4 and progressing to RWKV-5, RWKV-6, and RWKV-7. Hybrid language models, including Jamba, Zamba, and MiniMax, enhance hybrid designs uniquely. Additionally, Native Sparse Attention (NSA) organizes tokens into temporal blocks with three distinct attention paths: compressed coarse-grained tokens, selectively retained fine-grained tokens, and sliding windows for local contextual information. Other attention types include SeerAttention and Block Attention (MoBA).
Researchers from Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen, Hohai University, Nanjing, Shenzhen University, and Qinghai University, Xining, have proposed a novel hybrid architecture called RWKV-X that combines RWKV’s efficiency for short-range modeling with a sparse attention mechanism designed to capture long-range context. Unlike previous hybrid approaches, RWKV-X achieves linear-time complexity during training and constant-time complexity during inference decoding. It shows near-perfect accuracy on the 64K passkey retrieval benchmark when pretrained on 64K-token sequences continuously. The model consistently outperforms previous RWKV-7 models on long-context benchmarks while maintaining strong performance on short-context tasks.
The authors present a two-stage training method for efficient preheating and fine-tuning of RWKV-X. In the first stage, they use short sequences (4096 tokens) to preheat the model quickly. Subsequently, they perform multi-stage pretraining with increasing sequence lengths to enable the model to process longer sequences gradually. This approach is inspired by LLaMA Pro's zero-initialization technique, where newly added parameters for expanded layers are initialized to zero. In contrast to LLaMA Pro's single-stage training, which may lead to instability, RWKV-X adopts a two-stage approach with a preheating stage to ensure stability.
The Short-context evaluation reveals that RWKV-X maintains competitive performance across standard benchmarks. The smaller variant, RWKV-X (0.22B), achieves an average score of 51.0, comparable to RWKV-7’s 51.8. At a larger scale, RWKV-X (3.6B) reaches 71.9, closely matching RWKV-7 (2.9B, 72.8) and Qwen2.5-3B (71.4), while surpassing LLaMA3.2-3B (69.7). These results confirm RWKV-X’s effectiveness as a general-purpose LLM backbone without sacrificing performance on shorter contexts. Moreover, efficiency analysis demonstrates RWKV-X’s superior scaling characteristics for long sequences. At 128K tokens, RWKV-X achieves a 1.37 times speedup over Flash-Attention v3, with this advantage expanding as context length increases.
In this paper, researchers introduced RWKV-X, which emerges as a hybrid language model that successfully combines RWKV’s efficiency for modeling short-range dependencies with a novel sparse attention mechanism designed specifically for long-range context modeling. While RWKV-X demonstrates strong performance and efficiency in long-context language modeling, several limitations remain. First, its sparse attention mechanism, which relies on top-k chunk selection, employs a heuristic approach that may overlook semantically relevant dependencies. Second, the current implementation shows sparse attention decoding running slower than vanilla RWKV, indicating that further engineering efforts are needed to optimize performance.
Check out the Paper. Also, don’t forget to follow us on Twitter.
Here’s a brief overview of what we’re building at Marktechpost:
ML News Community - r/machinelearningnews (92k+ members)
Newsletter– airesearchinsights.com/ (30k+ subscribers)
miniCON AI Events - minicon.marktechpost.com
AI Reports & Magazines - magazine.marktechpost.com
AI Dev & Research News - marktechpost.
Clause de non-responsabilité:info@kdj.com
Les informations fournies ne constituent pas des conseils commerciaux. kdj.com n’assume aucune responsabilité pour les investissements effectués sur la base des informations fournies dans cet article. Les crypto-monnaies sont très volatiles et il est fortement recommandé d’investir avec prudence après une recherche approfondie!
Si vous pensez que le contenu utilisé sur ce site Web porte atteinte à vos droits d’auteur, veuillez nous contacter immédiatement (info@kdj.com) et nous le supprimerons dans les plus brefs délais.
-
- Kraken, aile arrière et Memecoins: une balade sauvage au Grand Prix de Singapour!
- Jul 09, 2025 at 12:50 am
- La compétition Memecoin de Kraken offre une chance unique pour un jeton d'obtenir son logo sur l'aile arrière d'une voiture F1, mélangeant la culture cryptographique avec des courses à grande vitesse.
-
-
-
- Pièces de crypto avec potentiel de croissance: les meilleurs choix pour les investisseurs avisés
- Jul 09, 2025 at 01:35 am
- Vous cherchez des pièces de monnaie cryptographique avec un potentiel de croissance? Ce guide met en évidence les meilleurs choix comme les blocs de blocage, les Qubetics et Little Pepe, offrant des informations sur les investissements intelligents.
-
- Onyxcoin (XCN) contre Solana (Sol): un pari prometteur dans le jeu de la cryptographie?
- Jul 09, 2025 at 12:30 am
- Onyxcoin (XCN) est-il un investissement cryptographique plus intelligent que Solana (Sol)? Nous nous plongeons dans les capitalisations boursières, le potentiel de croissance et les développements à venir pour le découvrir.
-
- Surge d'approvisionnement de Pi Network: une recette de problèmes de prix?
- Jul 09, 2025 at 02:10 am
- PI Network fait face à des défis avec sa grande approvisionnement en jetons et ses déverrouillages prévus. La demande va-t-elle se poursuivre ou les prix en souffriront-ils? Plongeons-nous dans les détails et les impacts potentiels.
-
- Pari audacieux de Corewave: comment l'IA est de remodeler l'exploitation bitcoin
- Jul 09, 2025 at 12:30 am
- L'acquisition de Corewave de Core Scientific signal un changement massif, mélangeant les infrastructures d'IA avec l'exploitation bitcoin. Explorez les implications de cet accord de 9 milliards de dollars.
-
- Flashback IPO Coinbase (Coin): le rallye est-il surpris ou en train de commencer?
- Jul 08, 2025 at 10:50 pm
- Les actions de Coinbase rebondent aux niveaux d'introduction en bourse, déclenchant des débats sur sa durabilité au milieu des victoires réglementaires, des entrées ETF et de la volatilité du marché. COIN est-il prêt pour son prochain chapitre?
-