$106754.608270 USD

1.33%

ethereum

$2625.824855 USD

3.80%

tether

$1.000127 USD

-0.03%

xrp

$2.189133 USD

1.67%

bnb

$654.521987 USD

0.66%

solana

$156.942801 USD

7.28%

usd-coin

$0.999814 USD

0.00%

dogecoin

$0.178030 USD

1.14%

tron

$0.270605 USD

-0.16%

cardano

$0.646989 USD

2.77%

hyperliquid

$44.646685 USD

10.24%

sui

$3.112812 USD

3.86%

bitcoin-cash

$455.764560 USD

3.00%

chainlink

$13.685763 USD

4.08%

unus-sed-leo

$9.268163 USD

0.21%

Cryptocurrency News Articles

Learning Concepts Behind Words Instead of Just Predicting the Next Token

Jun 12, 2025 at 01:32 pm

Efforts like CoCoMix (Jihoon et al., 2025)¹ by Meta have made conceptual learning, i.e., learning concepts behind words instead of just predicting the next token a reality

In the dynamic sphere of artificial intelligence, a persistent pursuit has been the development of language models capable not only of syntactic analysis but also of semantic comprehension, enabling them to engage in conversations on a conceptual level. This capability, often termed "conceptual learning," stands in contrast to the shallower analysis that focuses on predicting the next token in a sequence.

While efforts like CoCoMix (Jihoon et al., 2025)¹ by Meta have brought us closer to this goal, introducing models that are remarkably steerable and interpretable, another core question arises. Even a conceptually brilliant model could struggle with nuanced or factual recall challenges after training, during actual deployment.

Imagine asking a seemingly simple question like, “Earlier in our 2-million-token conversation, where did we discuss Pinocchio’s famously growing nose?” No matter how conceptually capable the LLM is, it cannot answer this simple question if the answer lies outside its context window.

But this is precisely the kind of adaptability that humans effortlessly display. We can engage in a conversation about 19th-century Impressionist art, quickly recall a story from earlier in the day, and then seamlessly transition to discussing the best route to avoid traffic. A human guide could quickly glance at a map and suggest a clever alley shortcut, something a GPS system would struggle with despite knowing the shortest path.

This ability to integrate new information and experiences into an ongoing narrative, adjusting plans and adapting to unexpected events, is crucial for meaningful communication and interaction with the world around us.

Now, a team of researchers at Google, in collaboration with researchers from Stanford University and the University of California, Irvine, has taken a significant step toward equipping large language models with this adaptable “memory” or performance boost precisely when it counts—during inference. Their findings are published in the journal Patterns.

Their research builds upon the groundbreaking work in introducing the Transformer architecture (Vaswani et al., 2017)², which quickly became ubiquitous in the modern AI landscape.

From the breakout success of Transformers and the surprising results of applying attention to various domains—vision tasks with Transformers (Dosovitskiy et al., 2020)³, time series forecasting with Transformers (Zerveas et al., 2021)⁴, and the remarkable performance of Transformers in natural language processing (Rogers et al., 2021)⁵—the researchers went deeper.

As the reliance on large models deepened and compute budgets expanded, even this “do it all” architecture began to show its limits, and so began the push to stretch its capabilities even further.

The bottleneck was attention’s ‘everyone-talks-to-everyone’ approach. Brilliantly efficient but quadratically expensive—imagine a room of a million people, where each person must remember every conversation with everyone. This restricted Transformers to a narrow “working memory,” struggling with the “long-term recall” needed for understanding vast documents, as early information simply faded away.

Moreover, vanilla transformers faced another fundamental hurdle—a lack of adaptability after training. While they excelled at applying their vast pre-trained knowledge to predict the next token, a process of sophisticated reasoning and prediction, this was not the same as true learning.

Like Google Maps, which quickly finds the shortest path but then wants you to drive through barricades because of ongoing construction, despite a human guide immediately suggesting a simple alley shortcut, transformers struggled to integrate new information into their existing knowledge.

This inability to “learn on the fly” from the data they are currently processing, adjusting their strategies and memories, represents a critical limitation for tasks requiring continuous adaptation or memory of novel experiences beyond the training set.

Instead of focusing narrowly on one limitation, the researchers took a broader perspective: how do intelligent systems, like the human brain, manage memory and adapt to new situations? It’s not about having one massive, ever-accessible memory; it’s a more flexible setup, where different components coordinate to handle different kinds of information and experiences.

The Titans architecture (Behrouz et al., 2025)⁶, named for the mythological beings known for their wisdom and adaptability, embraces this, built not around a single, monolithic attention block but around a cooperative team of specialized memory systems.

Each memory module in Titans plays a crucial role in understanding and responding to the task at hand. The spatial memory module (PM) stores a set of parameters that are prepended to the input sequence. These parameters are learned during training and act like a “Holy Grail” for the model to adhere to.

The spatial memory module (PM) stores a set of parameters that are prepended to the input sequence. These parameters are learned during training and act like a “Holy Grail” for the model to adhere to.

The researchers chose to implement the LMM using a simple multi-layer perceptron (MLP) network, which takes the output of the standard self-attention module (STM) at time step t, denoted as yt, as input.

Disclaimer:info@kdj.com

The information provided is not trading advice. kdj.com does not assume any responsibility for any investments made based on the information provided in this article. Cryptocurrencies are highly volatile and it is highly recommended that you invest with caution after thorough research！

If you believe that the content used on this website infringes your copyright, please contact us immediately (info@kdj.com) and we will delete it promptly.

Other articles published on Jun 20, 2025