|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Cryptocurrency News Articles
Chain of Thought: Reasoning Emerges in Language Models
Jan 29, 2025 at 05:00 am
The new models trained to express extended chain of thought are going to generalize outside of their breakthrough domains of code and math.

This post is early to accommodate some last minute travel on my end!
The new models trained to express extended chain of thought are going to generalize outside of their breakthrough domains of code and math. The “reasoning” process of language models that we use today is chain of thought reasoning. We ask the model to work step by step because it helps it manage complexity, especially in domains where the answer requires precision across multiple specific tokens. The domains where chain of thought (CoT) is most useful today are code, mathematics, and other “reasoning” tasks1. These are the domains where models like o1, R1, Gemini-Thinking, etc. were designed for.
Different intelligences reason in different ways that correspond to how they store and manipulate information. Humans compress a lifetime of experience into our spectacular, low-power brains that draw on past experience almost magically. The words that follow in this blog are also autoregressive, like the output of a language model, but draw on hours and hours of background processing as I converge on this argument.
Language models, on the other hand, are extremely general and do not today have architectures (or use-cases) that continually re-expose them to relevant problems and fold information back in a compressed form. Language models are very large, sophisticated, parametric probability distributions. All of their knowledge and information processing power is stored in the raw weights. Therein, they need a way of processing information that matches this. Chain of thought is that alignment.
Chain of thought reasoning allows information to be naturally processed in smaller chunks, allowing the large, brute force probability distribution to work one token at a time. Chain of thought, while allowing more compute per important token, also allows the models to store intermediate information in their context window without needing explicit recurrence.
Recurrence is required for reasoning and this can either happen in the parameter or state-space. Chain of thoughts with transformers handles all of this in the state-space of the problems. The humans we look at as the most intelligent have embedded information directly in the parameters of our brains that we can draw on.
Here is the only assumption of this piece — chain of thought is a natural fit for language models to “reason” and therefore one should be optimistic about training methods that are designed to enhance it generalizing to many domains.2 By the end of 2025 we should have ample evidence of this given the pace of the technological development.
If the analogies of types of intelligence aren’t convincing enough, a far more practical way to view the new style of training is a method that teaches the model to be better at allocating more compute to harder problems. If the skill is compute allocation, it is fundamental to the models handling a variety of tasks. Today’s reasoning models do not solve this perfectly, but they open the door for doing so precisely.
The nature of this coming generalization is not that these models are one size fits all, best in all cases: speed, intelligence, price, etc. There’s still no free lunch. A realistic outcome for reasoning heavy models in the next 0-3 years is a world where:
Reasoning trained models are superhuman on tasks with verifiable domains, like those with initial progress: Code, math, etc.
Reasoning trained models are well better in peak performance than existing autoregressive models in many domains we would not expect and are not necessarily verifiable.
Reasoning trained models are still better in performance at the long-tail of tasks, but worse in cost given the high inference costs of long-context.
Many of the leading figures in AI have been saying for quite some time that powerful AI is going to be “spikey" when it shows up — meaning that the capabilities and improvements will vary substantially across domains — but encountering this reality is very unintuitive.
Some evidence for generalization of reasoning models already exists.
OpenAI has already published multiple safety-oriented research projects with their new reasoning models in Deliberative Alignment: Reasoning Enables Safer Language Models and Trading Inference-Time Compute for Adversarial Robustness. These papers show their new methods can be translated to various safety domains, i.e. model safety policies and jailbreaking. The deliberative alignment paper shows them integrating a softer reward signal into the reasoning training — having a language model check how the safety policies apply to outputs.
An unsurprising quote from the deliberative alignment release related to generalization:
we find that deliberative alignment enables strong generalization to out-of-distribution safety scenarios.
Safety, qualitatively, is very orthogonal to traditional reasoning problems. Safety is very subjective to the information provided and subtle context, where math and coding problems are often about many small, forward processing steps towards a final goal. More behaviors will fit in between those.
This generative verifier for safety is not a ground truth signal and could theoretically be subject to reward hacking, but it was avoided. Generative verifiers will be crucial to expanding this training to countless domains — they’re easy to use and largely a new development
Disclaimer:info@kdj.com
The information provided is not trading advice. kdj.com does not assume any responsibility for any investments made based on the information provided in this article. Cryptocurrencies are highly volatile and it is highly recommended that you invest with caution after thorough research!
If you believe that the content used on this website infringes your copyright, please contact us immediately (info@kdj.com) and we will delete it promptly.
-
-
- Consensus 2026 Miami: Web3, Blockchain, Cryptocurrency, NFTs, Metaverse, Conference, May 5th — Where Wall Street Meets the Digital Frontier
- May 01, 2026 at 11:27 pm
- Miami buzzes as Consensus 2026 approaches on May 5th, highlighting Web3, blockchain, crypto, NFTs, and the metaverse's shift from hype to institutional and sustainable reality.
-
-
- Bitcoin Miners Electrify the Grid: Ohio Gas Plant Acquisition Powers Up a New Era for Digital Gold
- Apr 30, 2026 at 10:38 pm
- The Bitcoin mining industry is undergoing a significant transformation, with major players aggressively expanding operations and strategically acquiring energy assets like Ohio gas plants to solidify their future in the digital economy.
-
-
- Solana's Slippery Slope: Price Prediction Points to Resistance Loss and Potential Further Drops
- Apr 30, 2026 at 09:08 pm
- Solana is struggling to break key resistance, signaling potential downside. Repeated rejections at $86-$88, coupled with a broken short-term pattern, point to targets as low as $67, or even $40, as sellers maintain control. Investors should watch critical support levels closely.
-
-
- NYC's New Beat: Staking Systems, USD1, and Governance Drive Crypto's Next Wave
- Apr 30, 2026 at 03:02 pm
- From lucrative USD1 earning events to robust governance models, the crypto sphere is buzzing with innovations reshaping how we engage with digital assets, focusing on long-term commitment and stablecoin utility.
-
- OKX Unveils Agent Payments Protocol: Ushering in a New Era of AI Transactions
- Apr 30, 2026 at 02:53 pm
- OKX launches its Agent Payments Protocol (APP), an open standard for AI-driven commerce, enabling agents to manage full business cycles. Explore the implications for AI transactions and agentic payments.

































