$105398.502299 USD

1.75%

ethereum

$2555.207592 USD

3.43%

tether

$1.000429 USD

-0.02%

xrp

$2.141971 USD

2.09%

bnb

$651.827388 USD

1.41%

solana

$146.611988 USD

2.90%

usd-coin

$0.999805 USD

-0.01%

dogecoin

$0.177273 USD

3.19%

tron

$0.271470 USD

0.86%

cardano

$0.634997 USD

1.86%

hyperliquid

$41.657613 USD

9.72%

sui

$3.026449 USD

2.34%

bitcoin-cash

$444.966315 USD

11.29%

chainlink

$13.256001 USD

2.72%

unus-sed-leo

$9.032403 USD

1.94%

加密貨幣新聞文章

學習單詞背後的概念，而不僅僅是預測下一個令牌

2025/06/12 13:32

諸如cocomix（Jihoon等人，2025年）之類的努力，由Meta進行的概念學習，即，在單詞背後學習概念，而不僅僅是預測接下來的象徵

In the dynamic sphere of artificial intelligence, a persistent pursuit has been the development of language models capable not only of syntactic analysis but also of semantic comprehension, enabling them to engage in conversations on a conceptual level. This capability, often termed "conceptual learning," stands in contrast to the shallower analysis that focuses on predicting the next token in a sequence.

在人工智能的動態領域中，持續的追求是開發語言模型不僅可以通過句法分析，而且是語義理解的能力，使他們能夠在概念層面上進行對話。這種能力通常被稱為“概念學習”，與較淺的分析相反，該分析的重點是預測順序的下一個令牌。

While efforts like CoCoMix (Jihoon et al., 2025)¹ by Meta have brought us closer to this goal, introducing models that are remarkably steerable and interpretable, another core question arises. Even a conceptually brilliant model could struggle with nuanced or factual recall challenges after training, during actual deployment.

儘管Meta這樣的努力（Jihoon等人，2025年）Å使我們更接近這個目標，引入了非常可言和可解釋的模型，但出現了另一個核心問題。即使在概念上出色的模型也可能在訓練後，在實際部署期間面臨細微或事實召回的挑戰。

Imagine asking a seemingly simple question like, “Earlier in our 2-million-token conversation, where did we discuss Pinocchio’s famously growing nose?” No matter how conceptually capable the LLM is, it cannot answer this simple question if the answer lies outside its context window.

想像一下，問一個看似簡單的問題，例如：“在我們2000萬句話的談話中，我們在哪裡討論了匹諾奇奧著名的鼻子？”無論llm在概念上有多麼能力，如果答案位於上下文窗口之外，它都無法回答這個簡單的問題。

But this is precisely the kind of adaptability that humans effortlessly display. We can engage in a conversation about 19th-century Impressionist art, quickly recall a story from earlier in the day, and then seamlessly transition to discussing the best route to avoid traffic. A human guide could quickly glance at a map and suggest a clever alley shortcut, something a GPS system would struggle with despite knowing the shortest path.

但這正是人類毫不費力地展示的那種適應性。我們可以就19世紀的印象派藝術進行對話，很快回想起當天早些時候的故事，然後無縫過渡到討論避免交通的最佳途徑。人類嚮導可以迅速瀏覽地圖並暗示一個聰明的小巷快捷方式，儘管知道最短的道路，但GPS系統會遇到困難。

This ability to integrate new information and experiences into an ongoing narrative, adjusting plans and adapting to unexpected events, is crucial for meaningful communication and interaction with the world around us.

這種將新信息和經驗整合到正在進行的敘述中，調整計劃並適應意外事件的能力對於與我們周圍世界的有意義的溝通和互動至關重要。

Now, a team of researchers at Google, in collaboration with researchers from Stanford University and the University of California, Irvine, has taken a significant step toward equipping large language models with this adaptable “memory” or performance boost precisely when it counts—during inference. Their findings are published in the journal Patterns.

現在，Google的一組研究人員與斯坦福大學和加利福尼亞大學歐文分校的研究人員合作，朝著將大型語言模型與這種適應能力的“記憶”或績效精確提高（當它計算時）邁出了重要一步。他們的發現發表在期刊模式中。

Their research builds upon the groundbreaking work in introducing the Transformer architecture (Vaswani et al., 2017)², which quickly became ubiquitous in the modern AI landscape.

他們的研究基於引入變壓器體系結構的開創性工作（Vaswani等，2017）²，該作品迅速在現代AI景觀中變得無處不在。

From the breakout success of Transformers and the surprising results of applying attention to various domains—vision tasks with Transformers (Dosovitskiy et al., 2020)³, time series forecasting with Transformers (Zerveas et al., 2021)⁴, and the remarkable performance of Transformers in natural language processing (Rogers et al., 2021)⁵—the researchers went deeper.

從變形金剛的突破成功以及將注意力應用於各種領域的令人驚訝的結果 - 具有變形金剛的視覺任務（Dosovitskiy等，2020）³，《變壓器的時間序列預測》（Zerveas等，2021）⁴，以及變形金剛在自然語言處理中的出色性能（Rogers等人（Rogers et al。，20211）

As the reliance on large models deepened and compute budgets expanded, even this “do it all” architecture began to show its limits, and so began the push to stretch its capabilities even further.

隨著對大型模型的依賴加深和計算預算的擴大，即使是“全部”架構也開始顯示其限制，因此開始推動進一步擴展其能力。

The bottleneck was attention’s ‘everyone-talks-to-everyone’ approach. Brilliantly efficient but quadratically expensive—imagine a room of a million people, where each person must remember every conversation with everyone. This restricted Transformers to a narrow “working memory,” struggling with the “long-term recall” needed for understanding vast documents, as early information simply faded away.

瓶頸是注意的“每個人對每個人”的方法。效率非常高，但四邊形昂貴 - 想像一百萬人的房間，每個人都必須記住與所有人的每一次對話。這種限制在狹窄的“工作記憶”中，在理解大量文檔所需的“長期召回”中掙扎，隨著早期信息消失。

Moreover, vanilla transformers faced another fundamental hurdle—a lack of adaptability after training. While they excelled at applying their vast pre-trained knowledge to predict the next token, a process of sophisticated reasoning and prediction, this was not the same as true learning.

此外，Vanilla Transformers面臨著另一個基本障礙 - 訓練後缺乏適應性。儘管他們擅長運用其廣泛的預培訓知識來預測接下來的標記，這是一個複雜的推理和預測的過程，但這與真正的學習並不相同。

Like Google Maps, which quickly finds the shortest path but then wants you to drive through barricades because of ongoing construction, despite a human guide immediately suggesting a simple alley shortcut, transformers struggled to integrate new information into their existing knowledge.

就像Google Maps一樣，它很快找到了最短的道路，但隨後希望您由於持續的構造而開車穿越路障，儘管人類的指南立即暗示了一個簡單的小巷快捷方式，但變形金剛仍在努力將新信息集成到其現有知識中。

This inability to “learn on the fly” from the data they are currently processing, adjusting their strategies and memories, represents a critical limitation for tasks requiring continuous adaptation or memory of novel experiences beyond the training set.

這種無法從當前正在處理的數據中“學習”，調整他們的策略和記憶，這代表了需要持續適應或對訓練集以外的新經驗進行記憶的任務的關鍵限制。

Instead of focusing narrowly on one limitation, the researchers took a broader perspective: how do intelligent systems, like the human brain, manage memory and adapt to new situations? It’s not about having one massive, ever-accessible memory; it’s a more flexible setup, where different components coordinate to handle different kinds of information and experiences.

研究人員並沒有將專注於一個限制，而是採取了更廣泛的觀點：智能係統（如人腦）如何管理記憶並適應新情況？這並不是要擁有一個巨大的，越來越多的記憶；這是一個更靈活的設置，其中不同的組件協調以處理各種信息和體驗。

The Titans architecture (Behrouz et al., 2025)⁶, named for the mythological beings known for their wisdom and adaptability, embraces this, built not around a single, monolithic attention block but around a cooperative team of specialized memory systems.

泰坦建築（Behrouz等，2025）以以神話和適應性聞名的神話命名，它包含在一個單一的，單一的注意力障礙之中，而是圍繞一個專業記憶系統的合作團隊而建。

Each memory module in Titans plays a crucial role in understanding and responding to the task at hand. The spatial memory module (PM) stores a set of parameters that are prepended to the input sequence. These parameters are learned during training and act like a “Holy Grail” for the model to adhere to.

泰坦中的每個內存模塊在理解和響應手頭的任務方面都起著至關重要的作用。空間內存模塊（PM）存儲了一組參數，這些參數被預先添加到輸入序列。這些參數是在訓練期間學習的，並且像“聖杯”一樣旨在遵守模型。

The spatial memory module (PM) stores a set of parameters that are prepended to the input sequence. These parameters are learned during training and act like a “Holy Grail” for the model to adhere to.

空間內存模塊（PM）存儲了一組參數，這些參數被預先添加到輸入序列。這些參數是在訓練期間學習的，並且像“聖杯”一樣旨在遵守模型。

The researchers chose to implement the LMM using a simple multi-layer perceptron (MLP) network, which takes the output of the standard self-attention module (STM) at time step t, denoted as yt, as input.

研究人員選擇使用簡單的多層感知器（MLP）網絡實現LMM，該網絡在時間步驟t下採用標準自我發項模塊（STM）的輸出，稱為yt，為輸入。

免責聲明:info@kdj.com

所提供的資訊並非交易建議。 kDJ.com對任何基於本文提供的資訊進行的投資不承擔任何責任。加密貨幣波動性較大，建議您充分研究後謹慎投資！

如果您認為本網站使用的內容侵犯了您的版權，請立即聯絡我們（info@kdj.com），我們將及時刪除。

2025年06月14日其他文章發表於