市值: $3.3012T 0.460%
成交额(24h): $163.9614B 28.200%
  • 市值: $3.3012T 0.460%
  • 成交额(24h): $163.9614B 28.200%
  • 恐惧与贪婪指数:
  • 市值: $3.3012T 0.460%
加密货币
话题
百科
资讯
加密话题
视频
热门新闻
加密货币
话题
百科
资讯
加密话题
视频
bitcoin
bitcoin

$105398.502299 USD

1.75%

ethereum
ethereum

$2555.207592 USD

3.43%

tether
tether

$1.000429 USD

-0.02%

xrp
xrp

$2.141971 USD

2.09%

bnb
bnb

$651.827388 USD

1.41%

solana
solana

$146.611988 USD

2.90%

usd-coin
usd-coin

$0.999805 USD

-0.01%

dogecoin
dogecoin

$0.177273 USD

3.19%

tron
tron

$0.271470 USD

0.86%

cardano
cardano

$0.634997 USD

1.86%

hyperliquid
hyperliquid

$41.657613 USD

9.72%

sui
sui

$3.026449 USD

2.34%

bitcoin-cash
bitcoin-cash

$444.966315 USD

11.29%

chainlink
chainlink

$13.256001 USD

2.72%

unus-sed-leo
unus-sed-leo

$9.032403 USD

1.94%

加密货币新闻

学习单词背后的概念,而不仅仅是预测下一个令牌

2025/06/12 13:32

诸如cocomix(Jihoon等人,2025年)之类的努力,由Meta进行的概念学习,即,在单词背后学习概念,而不仅仅是预测接下来的象征

In the dynamic sphere of artificial intelligence, a persistent pursuit has been the development of language models capable not only of syntactic analysis but also of semantic comprehension, enabling them to engage in conversations on a conceptual level. This capability, often termed "conceptual learning," stands in contrast to the shallower analysis that focuses on predicting the next token in a sequence.

在人工智能的动态领域中,持续的追求是开发语言模型不仅可以通过句法分析,而且是语义理解的能力,使他们能够在概念层面上进行对话。这种能力通常被称为“概念学习”,与较浅的分析相反,该分析的重点是预测顺序的下一个令牌。

While efforts like CoCoMix (Jihoon et al., 2025)¹ by Meta have brought us closer to this goal, introducing models that are remarkably steerable and interpretable, another core question arises. Even a conceptually brilliant model could struggle with nuanced or factual recall challenges after training, during actual deployment.

尽管Meta这样的努力(Jihoon等人,2025年)Å使我们更接近这个目标,引入了非常可言和可解释的模型,但出现了另一个核心问题。即使在概念上出色的模型也可能在训练后,在实际部署期间面临细微或事实召回的挑战。

Imagine asking a seemingly simple question like, “Earlier in our 2-million-token conversation, where did we discuss Pinocchio’s famously growing nose?” No matter how conceptually capable the LLM is, it cannot answer this simple question if the answer lies outside its context window.

想象一下,问一个看似简单的问题,例如:“在我们2000万句话的谈话中,我们在哪里讨论了匹诺奇奥著名的鼻子?”无论llm在概念上有多么能力,如果答案位于上下文窗口之外,它都无法回答这个简单的问题。

But this is precisely the kind of adaptability that humans effortlessly display. We can engage in a conversation about 19th-century Impressionist art, quickly recall a story from earlier in the day, and then seamlessly transition to discussing the best route to avoid traffic. A human guide could quickly glance at a map and suggest a clever alley shortcut, something a GPS system would struggle with despite knowing the shortest path.

但这正是人类毫不费力地展示的那种适应性。我们可以就19世纪的印象派艺术进行对话,很快回想起当天早些时候的故事,然后无缝过渡到讨论避免交通的最佳途径。人类向导可以迅速浏览地图并暗示一个聪明的小巷快捷方式,尽管知道最短的道路,但GPS系统会遇到困难。

This ability to integrate new information and experiences into an ongoing narrative, adjusting plans and adapting to unexpected events, is crucial for meaningful communication and interaction with the world around us.

这种将新信息和经验整合到正在进行的叙述中,调整计划并适应意外事件的能力对于与我们周围世界的有意义的沟通和互动至关重要。

Now, a team of researchers at Google, in collaboration with researchers from Stanford University and the University of California, Irvine, has taken a significant step toward equipping large language models with this adaptable “memory” or performance boost precisely when it counts—during inference. Their findings are published in the journal Patterns.

现在,Google的一组研究人员与斯坦福大学和加利福尼亚大学欧文分校的研究人员合作,朝着将大型语言模型与这种适应能力的“记忆”或绩效精确提高(当它计算时)迈出了重要一步。他们的发现发表在期刊模式中。

Their research builds upon the groundbreaking work in introducing the Transformer architecture (Vaswani et al., 2017)², which quickly became ubiquitous in the modern AI landscape.

他们的研究基于引入变压器体系结构的开创性工作(Vaswani等,2017)²,该作品迅速在现代AI景观中变得无处不在。

From the breakout success of Transformers and the surprising results of applying attention to various domains—vision tasks with Transformers (Dosovitskiy et al., 2020)³, time series forecasting with Transformers (Zerveas et al., 2021)⁴, and the remarkable performance of Transformers in natural language processing (Rogers et al., 2021)⁵—the researchers went deeper.

从变形金刚的突破成功以及将注意力应用于各种领域的令人惊讶的结果 - 具有变形金刚的视觉任务(Dosovitskiy等,2020)³,《变压器的时间序列预测》(Zerveas等,2021)⁴,以及变形金刚在自然语言处理中的出色性能(Rogers等人(Rogers et al。,20211)

As the reliance on large models deepened and compute budgets expanded, even this “do it all” architecture began to show its limits, and so began the push to stretch its capabilities even further.

随着对大型模型的依赖加深和计算预算的扩大,即使是“全部”架构也开始显示其限制,因此开始推动进一步扩展其能力。

The bottleneck was attention’s ‘everyone-talks-to-everyone’ approach. Brilliantly efficient but quadratically expensive—imagine a room of a million people, where each person must remember every conversation with everyone. This restricted Transformers to a narrow “working memory,” struggling with the “long-term recall” needed for understanding vast documents, as early information simply faded away.

瓶颈是注意的“每个人对每个人”的方法。效率非常高,但四边形昂贵 - 想象一百万人的房间,每个人都必须记住与所有人的每一次对话。这种限制在狭窄的“工作记忆”中,在理解大量文档所需的“长期召回”中挣扎,随着早期信息消失。

Moreover, vanilla transformers faced another fundamental hurdle—a lack of adaptability after training. While they excelled at applying their vast pre-trained knowledge to predict the next token, a process of sophisticated reasoning and prediction, this was not the same as true learning.

此外,Vanilla Transformers面临着另一个基本障碍 - 训练后缺乏适应性。尽管他们擅长运用其广泛的预培训知识来预测接下来的标记,这是一个复杂的推理和预测的过程,但这与真正的学习并不相同。

Like Google Maps, which quickly finds the shortest path but then wants you to drive through barricades because of ongoing construction, despite a human guide immediately suggesting a simple alley shortcut, transformers struggled to integrate new information into their existing knowledge.

就像Google Maps一样,它很快找到了最短的道路,但随后希望您由于持续的构造而开车穿越路障,尽管人类的指南立即暗示了一个简单的小巷快捷方式,但变形金刚仍在努力将新信息集成到其现有知识中。

This inability to “learn on the fly” from the data they are currently processing, adjusting their strategies and memories, represents a critical limitation for tasks requiring continuous adaptation or memory of novel experiences beyond the training set.

这种无法从当前正在处理的数据中“学习”,调整他们的策略和记忆,这代表了需要持续适应或对训练集以外的新经验进行记忆的任务的关键限制。

Instead of focusing narrowly on one limitation, the researchers took a broader perspective: how do intelligent systems, like the human brain, manage memory and adapt to new situations? It’s not about having one massive, ever-accessible memory; it’s a more flexible setup, where different components coordinate to handle different kinds of information and experiences.

研究人员并没有将专注于一个限制,而是采取了更广泛的观点:智能系统(如人脑)如何管理记忆并适应新情况?这并不是要拥有一个巨大的,越来越多的记忆;这是一个更灵活的设置,其中不同的组件协调以处理各种信息和体验。

The Titans architecture (Behrouz et al., 2025)⁶, named for the mythological beings known for their wisdom and adaptability, embraces this, built not around a single, monolithic attention block but around a cooperative team of specialized memory systems.

泰坦建筑(Behrouz等,2025)以以神话和适应性闻名的神话命名,它包含在一个单一的,单一的注意力障碍之中,而是围绕一个专业记忆系统的合作团队而建。

Each memory module in Titans plays a crucial role in understanding and responding to the task at hand. The spatial memory module (PM) stores a set of parameters that are prepended to the input sequence. These parameters are learned during training and act like a “Holy Grail” for the model to adhere to.

泰坦中的每个内存模块在理解和响应手头的任务方面都起着至关重要的作用。空间内存模块(PM)存储了一组参数,这些参数被预先添加到输入序列。这些参数是在训练期间学习的,并且像“圣杯”一样旨在遵守模型。

The spatial memory module (PM) stores a set of parameters that are prepended to the input sequence. These parameters are learned during training and act like a “Holy Grail” for the model to adhere to.

空间内存模块(PM)存储了一组参数,这些参数被预先添加到输入序列。这些参数是在训练期间学习的,并且像“圣杯”一样旨在遵守模型。

The researchers chose to implement the LMM using a simple multi-layer perceptron (MLP) network, which takes the output of the standard self-attention module (STM) at time step t, denoted as yt, as input.

研究人员选择使用简单的多层感知器(MLP)网络实现LMM,该网络在时间步骤t下采用标准自我发项模块(STM)的输出,称为yt,为输入。

免责声明:info@kdj.com

所提供的信息并非交易建议。根据本文提供的信息进行的任何投资,kdj.com不承担任何责任。加密货币具有高波动性,强烈建议您深入研究后,谨慎投资!

如您认为本网站上使用的内容侵犯了您的版权,请立即联系我们(info@kdj.com),我们将及时删除。

2025年06月14日 发表的其他文章