$105398.502299 USD

1.75%

ethereum

$2555.207592 USD

3.43%

tether

$1.000429 USD

-0.02%

xrp

$2.141971 USD

2.09%

bnb

$651.827388 USD

1.41%

solana

$146.611988 USD

2.90%

usd-coin

$0.999805 USD

-0.01%

dogecoin

$0.177273 USD

3.19%

tron

$0.271470 USD

0.86%

cardano

$0.634997 USD

1.86%

hyperliquid

$41.657613 USD

9.72%

sui

$3.026449 USD

2.34%

bitcoin-cash

$444.966315 USD

11.29%

chainlink

$13.256001 USD

2.72%

unus-sed-leo

$9.032403 USD

1.94%

암호화폐 뉴스 기사

다음 토큰을 예측하는 대신 단어 뒤에있는 개념 학습

2025/06/12 13:32

Cocomix와 같은 노력 (Jihoon et al., 2025) ¹ Meta의 개념 학습, 즉 다음 토큰을 현실로 예측하는 대신 단어 뒤에있는 개념을 학습했습니다.

In the dynamic sphere of artificial intelligence, a persistent pursuit has been the development of language models capable not only of syntactic analysis but also of semantic comprehension, enabling them to engage in conversations on a conceptual level. This capability, often termed "conceptual learning," stands in contrast to the shallower analysis that focuses on predicting the next token in a sequence.

인공 지능의 역동적 인 영역에서, 지속적인 추구는 구문 분석뿐만 아니라 의미 론적 이해도 할 수있는 언어 모델의 개발이되어 개념적 수준에서 대화에 참여할 수있게 해주었다. 종종 "개념 학습"이라고 불리는이 기능은 다음 토큰을 순서대로 예측하는 데 중점을 둔 얕은 분석과 대조적입니다.

While efforts like CoCoMix (Jihoon et al., 2025)¹ by Meta have brought us closer to this goal, introducing models that are remarkably steerable and interpretable, another core question arises. Even a conceptually brilliant model could struggle with nuanced or factual recall challenges after training, during actual deployment.

Cocomix (Jihoon et al., 2025)와 같은 노력이 Meta의 노력으로 인해이 목표에 더 가까워졌지만 놀랍도록 조향 가능하고 해석 가능한 모델을 도입했지만 또 다른 핵심 질문이 발생합니다. 개념적으로 화려한 모델조차도 실제 배치 중 훈련 후 미묘한 또는 사실적인 리콜 문제로 어려움을 겪을 수 있습니다.

Imagine asking a seemingly simple question like, “Earlier in our 2-million-token conversation, where did we discuss Pinocchio’s famously growing nose?” No matter how conceptually capable the LLM is, it cannot answer this simple question if the answer lies outside its context window.

“2 백만 번의 대화 초기에는 피노키오의 유명한 코가 코에 대해 논의 했는가?”와 같은 간단한 질문을한다고 상상해보십시오. LLM이 얼마나 개념적으로 유능하더라도, 대답이 컨텍스트 창 밖에 있으면이 간단한 질문에 대답 할 수 없습니다.

But this is precisely the kind of adaptability that humans effortlessly display. We can engage in a conversation about 19th-century Impressionist art, quickly recall a story from earlier in the day, and then seamlessly transition to discussing the best route to avoid traffic. A human guide could quickly glance at a map and suggest a clever alley shortcut, something a GPS system would struggle with despite knowing the shortest path.

그러나 이것은 정확히 인간이 쉽게 보여주는 종류의 적응성입니다. 우리는 19 세기 인상주의 예술에 관한 대화에 참여하고, 그날의 이야기를 빠르게 기억 한 다음, 트래픽을 피하기위한 최상의 경로를 논의하는 것으로 완벽하게 전환 할 수 있습니다. 인간 가이드는지도를 빨리 눈부신 것으로 영리한 골목 단축키를 제안 할 수 있습니다.

This ability to integrate new information and experiences into an ongoing narrative, adjusting plans and adapting to unexpected events, is crucial for meaningful communication and interaction with the world around us.

새로운 정보와 경험을 지속적인 이야기로 통합하고 계획을 조정하고 예상치 못한 사건에 적응하는이 능력은 우리 주변의 세계와의 의미있는 의사 소통과 상호 작용에 중요합니다.

Now, a team of researchers at Google, in collaboration with researchers from Stanford University and the University of California, Irvine, has taken a significant step toward equipping large language models with this adaptable “memory” or performance boost precisely when it counts—during inference. Their findings are published in the journal Patterns.

이제 Google의 연구원 팀은 Stanford University 및 Irvine University of California의 연구원들과 협력하여 큰 언어 모델을 적응할 수있는 "메모리"또는 성능 향상으로 계산할 때 중요한 단계를 밟았습니다. 그들의 발견은 저널 패턴에 출판됩니다.

Their research builds upon the groundbreaking work in introducing the Transformer architecture (Vaswani et al., 2017)², which quickly became ubiquitous in the modern AI landscape.

그들의 연구는 Transformer Architecture (Vaswani et al., 2017) ²를 소개하는 획기적인 작업을 기반으로하며, 이는 현대 AI 환경에서 빠르게 유비쿼터스되었습니다.

From the breakout success of Transformers and the surprising results of applying attention to various domains—vision tasks with Transformers (Dosovitskiy et al., 2020)³, time series forecasting with Transformers (Zerveas et al., 2021)⁴, and the remarkable performance of Transformers in natural language processing (Rogers et al., 2021)⁵—the researchers went deeper.

트랜스포머의 브레이크 아웃 성공과 다양한 영역에주의를 기울이는 놀라운 결과 - 변압기가있는 vision 작업 (Dosovitskiy et al., 2020) ³, 트랜스포머와의 시계열 예측 (Zerveas et al., 2021) ⁴ 및 자연 언어 처리에서 변압기의 놀라운 성능 (Rogers et al., 2021) - 연구자들은 심화되었습니다.

As the reliance on large models deepened and compute budgets expanded, even this “do it all” architecture began to show its limits, and so began the push to stretch its capabilities even further.

대규모 모델에 대한 의존이 심화되고 예산을 계산함에 따라,이“모든 일”아키텍처조차도 한계를 보여주기 시작했기 때문에 그 능력을 더욱 확장하기 시작했습니다.

The bottleneck was attention’s ‘everyone-talks-to-everyone’ approach. Brilliantly efficient but quadratically expensive—imagine a room of a million people, where each person must remember every conversation with everyone. This restricted Transformers to a narrow “working memory,” struggling with the “long-term recall” needed for understanding vast documents, as early information simply faded away.

병목 현상은주의의 '모든 사람의 말에 대한 모든 것'접근법이었습니다. 훌륭하게 효율적이지만 2 차 비싸다 - 각 사람이 모든 사람과의 모든 대화를 기억 해야하는 백만 명의 방을 상상하십시오. 이 제한된 변압기는 좁은“작업 메모리”로 제한된 변압기로, 초기 정보가 단순히 사라 졌기 때문에 광대 한 문서를 이해하는 데 필요한“장기 리콜”으로 어려움을 겪고 있습니다.

Moreover, vanilla transformers faced another fundamental hurdle—a lack of adaptability after training. While they excelled at applying their vast pre-trained knowledge to predict the next token, a process of sophisticated reasoning and prediction, this was not the same as true learning.

또한 바닐라 변압기는 훈련 후 적응력이 부족한 또 다른 근본적인 장애물에 직면했습니다. 그들은 정교한 추론과 예측 과정 인 다음 토큰을 예측하기 위해 광대 한 미리 훈련 된 지식을 적용하는 데 뛰어 났지만, 이것은 진정한 학습과 동일하지 않았습니다.

Like Google Maps, which quickly finds the shortest path but then wants you to drive through barricades because of ongoing construction, despite a human guide immediately suggesting a simple alley shortcut, transformers struggled to integrate new information into their existing knowledge.

가장 짧은 경로를 빠르게 찾는 Google지도와 마찬가지로, 간단한 골목 단축키를 제안하는 인간 가이드에도 불구하고 지속적인 건설로 인해 바리케이드를 운전하기를 원합니다. 트랜스포머는 새로운 정보를 기존 지식에 통합하기 위해 고군분투했습니다.

This inability to “learn on the fly” from the data they are currently processing, adjusting their strategies and memories, represents a critical limitation for tasks requiring continuous adaptation or memory of novel experiences beyond the training set.

현재 처리중인 데이터에서 "즉시 학습"할 수 없음 전략 및 기억 조정은 훈련 세트 이외의 새로운 경험에 대한 지속적인 적응 또는 기억이 필요한 작업에 대한 중요한 제한을 나타냅니다.

Instead of focusing narrowly on one limitation, the researchers took a broader perspective: how do intelligent systems, like the human brain, manage memory and adapt to new situations? It’s not about having one massive, ever-accessible memory; it’s a more flexible setup, where different components coordinate to handle different kinds of information and experiences.

연구자들은 한 가지 한계에 좁게 집중하는 대신 더 넓은 관점을 취했습니다. 인간 뇌와 같은 지능형 시스템은 어떻게 기억을 관리하고 새로운 상황에 적응합니까? 그것은 하나의 거대하고 끊임없이 접근 할 수있는 메모리를 갖는 것이 아닙니다. 다른 구성 요소가 다른 종류의 정보와 경험을 처리하기 위해 조정하는보다 유연한 설정입니다.

The Titans architecture (Behrouz et al., 2025)⁶, named for the mythological beings known for their wisdom and adaptability, embraces this, built not around a single, monolithic attention block but around a cooperative team of specialized memory systems.

Titans Architecture (Behrouz et al., 2025)는 그들의 지혜와 적응성으로 알려진 신화적인 존재들로 지명되었으며, 단일 단일의 모 놀리 식주의 블록 주위에 지어진 것이 아니라 전문 메모리 시스템의 협력 팀 주위에이를 수용합니다.

Each memory module in Titans plays a crucial role in understanding and responding to the task at hand. The spatial memory module (PM) stores a set of parameters that are prepended to the input sequence. These parameters are learned during training and act like a “Holy Grail” for the model to adhere to.

타이탄의 각 메모리 모듈은 당면한 작업을 이해하고 응답하는 데 중요한 역할을합니다. 공간 메모리 모듈 (PM)은 입력 시퀀스로 선정되는 매개 변수 세트를 저장합니다. 이 매개 변수는 훈련 중에 배우고 모델을 준수하기위한“성배”처럼 작용합니다.

The spatial memory module (PM) stores a set of parameters that are prepended to the input sequence. These parameters are learned during training and act like a “Holy Grail” for the model to adhere to.

공간 메모리 모듈 (PM)은 입력 시퀀스로 선정되는 매개 변수 세트를 저장합니다. 이 매개 변수는 훈련 중에 배우고 모델을 준수하기위한“성배”처럼 작용합니다.

The researchers chose to implement the LMM using a simple multi-layer perceptron (MLP) network, which takes the output of the standard self-attention module (STM) at time step t, denoted as yt, as input.

연구원들은 간단한 멀티 층 퍼셉트론 (MLP) 네트워크를 사용하여 LMM을 구현하기로 결정했는데, 이는 시간 단계에서 표준 자체 변환 모듈 (STM)의 출력을 입력으로 표시합니다.

부인 성명:info@kdj.com

제공된 정보는 거래 조언이 아닙니다. kdj.com은 이 기사에 제공된 정보를 기반으로 이루어진 투자에 대해 어떠한 책임도 지지 않습니다. 암호화폐는 변동성이 매우 높으므로 철저한 조사 후 신중하게 투자하는 것이 좋습니다!

2025年06月14日 에 게재된 다른 기사

더