$117289.069656 USD

-0.86%

ethereum

$3113.112159 USD

4.67%

xrp

$2.893070 USD

0.63%

tether

$0.999982 USD

-0.01%

bnb

$687.529241 USD

0.62%

solana

$162.039495 USD

0.92%

usd-coin

$0.999952 USD

0.01%

dogecoin

$0.197164 USD

2.40%

tron

$0.301446 USD

0.01%

cardano

$0.737106 USD

1.91%

hyperliquid

$47.321483 USD

-1.07%

stellar

$0.456759 USD

2.99%

sui

$3.995576 USD

2.48%

chainlink

$15.932532 USD

2.86%

bitcoin-cash

$498.771959 USD

1.15%

暗号通貨のニュース記事

鉛筆：短い記憶の長い考え

2025/05/13 08:26

OpenaiのO1/O3、DeepseekのR1、Anthropic's Claude 3.7などの最近の大規模な言語モデル（LLMS）は、モデルがテスト時により深くより長く考えることができるようにすることで、モデルの推論能力を大幅に強化できることを示しています。

Recent large language models (LLMs) — such as OpenAI’s o1/o3, DeepSeek’s R1 and Anthropic’s Claude 3.7— demonstrate that allowing the model to think deeper and longer at test time can significantly enhance model’s reasoning capability. The core approach underlying their deep thinking capability is called chain-of-thought (CoT), where the model iteratively generates intermediate reasoning steps and appends them to the current context until producing the final answer.

OpenaiのO1/O3、DeepseekのR1、Anthropic's Claude 3.7などの最近の大規模な言語モデル（LLMS）は、モデルがテスト時により深くより長く考えることができるようにすることで、モデルの推論能力を大幅に強化できることを示しています。その深い思考能力の根底にあるコアアプローチは、チェーンオブサート（COT）と呼ばれ、モデルは中間推論の手順を繰り返し生成し、最終的な回答を生成するまで現在のコンテキストに追加します。

However, as tasks become increasingly complex, the steps needed to solve them grow dramatically. For instance, consider solving NP-hard problems using CoT — the reasoning trace would inevitably span exponential steps, assuming a fixed-size Transformer as the base model and P ≠ NP. This raises an important question:

ただし、タスクがますます複雑になるにつれて、それらを解決するために必要な手順は劇的に成長します。たとえば、COTを使用してNPハードの問題を解決することを検討します。推論トレースは、ベースモデルとP≠NPとして固定サイズの変圧器を仮定して、必然的に指数ステップにまたがります。これは重要な質問を提起します：

Will CoT-based test-time scaling hit hard ceilings?

COTベースのテストタイムスケーリングはハード天井にヒットしますか？

Unfortunately, probably yes. Various limitations will emerge for harder tasks: (1) chains will inevitably exceed model’s context windows, (2) critical information becomes buried and nearly impossible to retrieve from numerous preceding tokens, and (3) the self-attention complexity makes generating each new token prohibitively expensive.

残念ながら、おそらくはい。より難しいタスクにはさまざまな制限が生まれます。（1）チェーンは必然的にモデルのコンテキストウィンドウを超え、（2）重要な情報が埋もれ、先行する多数のトークンから取得することがほぼ不可能になり、（3）自己関節の複雑さにより、それぞれの新しいトークンが法外に高価になります。

In this article, we challenge the conventional “write-only” CoT reasoning paradigm that dominates current LLM architectures, from both theoretical and practical perspectives. Furthermore, we will explore a fundamentally different reasoning approach that allows LLM to not only generate thoughts, but also erase thoughts. This capacity for thought erasure not only offers significant practical benefits in performance and efficiency, but proves fundamental for achieving optimal reasoning efficiency from a computational theory perspective.

この記事では、理論的および実用的な視点から、現在のLLMアーキテクチャを支配する従来の「書き込み専用」COTの推論パラダイムに挑戦します。さらに、LLMが思考を生成するだけでなく、思考を消去できるようにする根本的に異なる推論アプローチを探求します。この思考消去能力は、パフォーマンスと効率に大きな実用的な利点を提供するだけでなく、計算理論の観点から最適な推論効率を達成するための基本を証明しています。

This post is based on the paper C. Yang et al., “PENCIL: Long thoughts with short memory” accepted in International Conference on Machine Learning 2025, a collaboration with Nathan Srebro, David McAllester, Zhiyuan Li. Code is also available.

この投稿は、Paper C. Yang et al。、「鉛筆：短い記憶を備えた長い考え」2025年の国際会議で受け入れられた、Nathan Srebro、David McAllester、Zhiyuan Liとのコラボレーションに基づいています。コードも利用できます。

Not Everything Needs to Be Remembered

すべてを覚えておく必要はありません

The idea of selectively discarding information has deep roots in computer science history, from the earliest computational models to modern systems. The classic Turing machine overwrites symbols on its tape rather than preserving every state; programming languages reclaim memory through stack frames that are automatically released when functions complete their execution; and modern garbage collectors continuously identify and remove objects no longer accessible to the program. These mechanisms weren’t merely efficiency optimizations — they were essential design choices that made complex computation possible within finite resources.

情報を選択的に破棄するという考え方は、初期の計算モデルから最新のシステムまで、コンピューターサイエンスの歴史に深いルーツを持っています。古典的なチューリングマシンは、すべての状態を保存するのではなく、テープに記号を上書きします。プログラミング言語は、関数が実行を完了したときに自動的にリリースされるスタックフレームを介してメモリを回収します。また、現代のゴミコレクターは、プログラムにアクセスできないオブジェクトを継続的に識別および削除します。これらのメカニズムは、単なる効率の最適化ではありませんでした。これらは、有限リソース内で複雑な計算を可能にする不可欠な設計の選択でした。

This idea also applies to human reasoning. In theorem proving, once a lemma is established, we discard its detailed derivation while preserving the result; when exploring problem-solving approaches, we simply mark unproductive paths as “failed” without retaining their full traces. Throughout complex reasoning, we naturally compress information, retaining conclusions while discarding the scaffolding used to reach them.

このアイデアは、人間の推論にも当てはまります。定理証明では、補題が確立されると、結果を維持しながらその詳細な導出を廃棄します。問題解決アプローチを探索するとき、完全な痕跡を保持することなく、非生産的なパスを「失敗」とマークするだけです。複雑な推論を通して、私たちは自然に情報を圧縮し、それらに到達するために使用される足場を破棄しながら結論を維持します。

✏️ PENCIL: A New Reasoning Paradigm

pencil：新しい推論パラダイム

Therefore, we propose ✏️ PENCIL, a new reasoning paradigm for LLMs. Unlike ✒️ CoT that only generates thoughts, PENCIL recursively generates and erases thoughts until reaching the final answer. It maintains only the minimal context required for generating future thoughts, so the model can think longer and deeper to solve harder tasks using shorter working memory. The following figure illustrates how PENCIL works

したがって、LLMSの新しい推論パラダイムであるPencilを提案します。思考のみを生成する✒️コットとは異なり、鉛筆は最終回答に到達するまで考えを再帰的に生成し、消去します。将来の思考を生成するために必要な最小限のコンテキストのみを維持するため、モデルは、より短い作業メモリを使用して難しいタスクを解くために、より長くより深く考えることができます。次の図は、鉛筆の仕組みを示しています

How Do Models Erase Thoughts?

モデルはどのように考えを消去しますか？

PENCIL’s erasure mechanism draws on two classical ideas. First, from rewriting rules in logic and classical automated theorem proving, which continuously apply predefined rules to simplify complex logical or arithmetic expressions into canonical forms until reaching a final answer. Second, from functional programming languages, which creates stack frames to store local variables when calling functions and releases corresponding memory when functions return, automatically discarding intermediate states that are no longer needed.

鉛筆の消去メカニズムは、2つの古典的なアイデアに基づいています。第一に、論理および古典的な自動化定理証明の書き換えから、事前定義されたルールを継続的に適用して、複雑な論理または算術式を標準的な形式に簡素化し、最終回答に到達するまで標準的な形式になります。第二に、機能的なプログラミング言語から、スタックフレームを作成して、関数を呼び出すときにローカル変数を保存し、関数が返されるときに対応するメモリをリリースし、不要な中間状態を自動的に破棄します。

Specifically, we introduce three special tokens, called [CALL], [SEP], and [RETURN], and use the following reduction rule to implement erasure:

具体的には、[コール]、[SEP]、および[return]と呼ばれる3つの特別なトークンを導入し、次の削減ルールを使用して消去を実装します。

where C stands for context, T stands for intermediate thoughts, and A stands for answer. Whenever the generated sequence completely matches the pattern on the left, PENCIL triggers the reduction rule, erasing thoughts and merging the answer back into the context. It is important to note that C, T and A can themselves contain special tokens, thereby supporting recursive structures similar to nested function calls — for example, C may contain another [CALL] token, indicating that a new thinking subroutine has been initiated.

Cはコンテキストの略で、Tは中間思考の略で、Aは答えを表します。生成されたシーケンスが左のパターンに完全に一致するときはいつでも、鉛筆は削減ルールを引き起こし、考えを消去し、答えをコンテキストに戻します。 C、T、Aは特別なトークンを含むことがあり、それによりネストされた関数呼び出しに似た再帰構造をサポートすることに注意することが重要です。たとえば、Cには別の[呼び出し]トークンが含まれている場合があり、新しい思考サブルーチンが開始されたことを示しています。

How to Use PENCIL?

鉛筆の使い方は？

PENCIL’s erasure mechanism flexibly supports various reasoning patterns, such as:

鉛筆の消去メカニズムは、次のようなさまざまな推論パターンを柔軟にサポートしています。

1️⃣ Task Decomposition: Using [CALL] to initiate subproblems, generate intermediate results, and then use [SEP] and [RETURN] to merge outputs and erase subproblem reasoning details;

1️⃣タスク分解：[call]を使用してサブ問題を開始し、中間結果を生成し、[SEP]と[return]を使用して出力をマージし、サブ問題の推論の詳細を消去します。

2️⃣ Branch and Backtrack: Using a [CALL], [SEP], [RETURN] triplet to manage an exploration branch in a search tree, erasing invalid paths upon conflicts or failures.

2️⃣ブランチとバックトラック：[call]、[sep]、[return]トリプレットを使用して、検索ツリーの探査ブランチを管理し、紛争または障害に無効な経路を消去します。

3️⃣ Summarization / Tail Recursion: Condensing a lengthy reasoning trace into concise summary, similar to tail recursion optimization in programming:

3️⃣要約 /尾の再帰：プログラミングにおける尾の再帰の最適化と同様に、長い推論トレースを簡潔な要約に凝縮する：

where T represents the original complex reasoning process (or a more difficult problem), and T' represents the summarized or simplified

ここで、Tは元の複雑な推論プロセス（またはより困難な問題）を表し、t 'は要約または簡素化されたものを表します

免責事項:info@kdj.com

提供される情報は取引に関するアドバイスではありません。 kdj.com は、この記事で提供される情報に基づいて行われた投資に対して一切の責任を負いません。暗号通貨は変動性が高いため、十分な調査を行った上で慎重に投資することを強くお勧めします。

このウェブサイトで使用されているコンテンツが著作権を侵害していると思われる場合は、直ちに当社 (info@kdj.com) までご連絡ください。速やかに削除させていただきます。

2025年07月17日に掲載されたその他の記事

もっと