$108530.002893 USD

1.12%

ethereum

$2501.495543 USD

2.83%

tether

$1.000245 USD

-0.01%

xrp

$2.198131 USD

0.43%

bnb

$654.360076 USD

0.87%

solana

$152.192030 USD

1.55%

usd-coin

$0.999839 USD

0.00%

tron

$0.276594 USD

0.49%

dogecoin

$0.167580 USD

2.68%

cardano

$0.568515 USD

0.60%

hyperliquid

$40.700758 USD

7.87%

bitcoin-cash

$500.972465 USD

1.64%

sui

$2.847545 USD

2.13%

chainlink

$13.518965 USD

1.41%

unus-sed-leo

$9.163651 USD

0.47%

加密貨幣新聞文章

鉛筆：記憶短的長時間想法

2025/05/13 08:26

最近的大型語言模型（LLMS）（例如OpenAI的O1/O3，DeepSeek的R1和Anthropic的Claude 3.7）表明，允許該模型在測試時進行更深入的思考可以顯著增強模型的推理能力。

Recent large language models (LLMs) — such as OpenAI’s o1/o3, DeepSeek’s R1 and Anthropic’s Claude 3.7— demonstrate that allowing the model to think deeper and longer at test time can significantly enhance model’s reasoning capability. The core approach underlying their deep thinking capability is called chain-of-thought (CoT), where the model iteratively generates intermediate reasoning steps and appends them to the current context until producing the final answer.

最近的大型語言模型（LLMS）（例如OpenAI的O1/O3，DeepSeek的R1和Anthropic的Claude 3.7）表明，允許該模型在測試時更深入地思考可以顯著增強模型的推理能力。他們深思熟慮能力的核心方法稱為“經營鏈”（COT），該模型迭代地生成了中間的推理步驟，並將它們附加到當前上下文中，直到產生最終答案。

However, as tasks become increasingly complex, the steps needed to solve them grow dramatically. For instance, consider solving NP-hard problems using CoT — the reasoning trace would inevitably span exponential steps, assuming a fixed-size Transformer as the base model and P ≠ NP. This raises an important question:

但是，隨著任務變得越來越複雜，解決這些任務所需的步驟急劇發展。例如，考慮使用COT解決NP硬性問題 - 推理跡線不可避免地會跨越指數步驟，假設固定尺寸變壓器是基本模型和P≠NP。這提出了一個重要的問題：

Will CoT-based test-time scaling hit hard ceilings?

基於COT的測試時間縮放會撞到硬天花板嗎？

Unfortunately, probably yes. Various limitations will emerge for harder tasks: (1) chains will inevitably exceed model’s context windows, (2) critical information becomes buried and nearly impossible to retrieve from numerous preceding tokens, and (3) the self-attention complexity makes generating each new token prohibitively expensive.

不幸的是，可能是的。對於更艱鉅的任務，將出現各種限制：（1）鏈條不可避免地會超過模型的上下文窗口，（2）關鍵信息被掩埋，幾乎不可能從前面的代幣中檢索，並且（3）自我注意事項複雜性使生成每個新的代幣過於昂貴的新標記。

In this article, we challenge the conventional “write-only” CoT reasoning paradigm that dominates current LLM architectures, from both theoretical and practical perspectives. Furthermore, we will explore a fundamentally different reasoning approach that allows LLM to not only generate thoughts, but also erase thoughts. This capacity for thought erasure not only offers significant practical benefits in performance and efficiency, but proves fundamental for achieving optimal reasoning efficiency from a computational theory perspective.

在本文中，我們挑戰了從理論和實踐角度來統治當前LLM架構的傳統“僅寫” COT推理範式。此外，我們將探索一種根本不同的推理方法，該方法使LLM不僅可以產生思想，還可以消除思想。這種思想擦除能力不僅為績效和效率帶來了重大的實際好處，而且證明從計算理論的角度實現最佳推理效率是基本的。

This post is based on the paper C. Yang et al., “PENCIL: Long thoughts with short memory” accepted in International Conference on Machine Learning 2025, a collaboration with Nathan Srebro, David McAllester, Zhiyuan Li. Code is also available.

這篇文章基於C. Yang等人的論文，“鉛筆：長時間的記憶力，記憶短”，在機器學習2025年國際會議上，與Nathan Srebro，David McAllester的合作，David McAllester，Zhiyuan li。代碼也可用。

Not Everything Needs to Be Remembered

並非一切都需要記住

The idea of selectively discarding information has deep roots in computer science history, from the earliest computational models to modern systems. The classic Turing machine overwrites symbols on its tape rather than preserving every state; programming languages reclaim memory through stack frames that are automatically released when functions complete their execution; and modern garbage collectors continuously identify and remove objects no longer accessible to the program. These mechanisms weren’t merely efficiency optimizations — they were essential design choices that made complex computation possible within finite resources.

從最早的計算模型到現代系統，選擇性丟棄信息的想法在計算機科學歷史上具有深厚的根源。經典的Turing機器在其磁帶上覆蓋符號，而不是保存每個狀態。編程語言通過堆棧框架回收內存，這些堆棧框架在函數完成執行時會自動釋放；現代垃圾收集器不再識別和刪除該程序不再訪問的對象。這些機制不僅僅是效率優化 - 它們是必不可少的設計選擇，使得在有限資源中可以進行複雜的計算。

This idea also applies to human reasoning. In theorem proving, once a lemma is established, we discard its detailed derivation while preserving the result; when exploring problem-solving approaches, we simply mark unproductive paths as “failed” without retaining their full traces. Throughout complex reasoning, we naturally compress information, retaining conclusions while discarding the scaffolding used to reach them.

這個想法也適用於人類的推理。在定理證明，一旦建立了引理，我們在保留結果的同時放棄了其詳細的推導。在探索解決問題的方法時，我們只是將非生產性的路徑標記為“失敗”，而不會保留其完整的痕跡。在整個複雜的推理中，我們自然會壓縮信息，保留結論，同時丟棄用於達到這些信息的腳手架。

✏️ PENCIL: A New Reasoning Paradigm

✏️鉛筆：新的推理範式

Therefore, we propose ✏️ PENCIL, a new reasoning paradigm for LLMs. Unlike ✒️ CoT that only generates thoughts, PENCIL recursively generates and erases thoughts until reaching the final answer. It maintains only the minimal context required for generating future thoughts, so the model can think longer and deeper to solve harder tasks using shorter working memory. The following figure illustrates how PENCIL works

因此，我們提出了✏️鉛筆，這是LLM的新推理範式。與僅產生思想的✒️COT不同，鉛筆遞歸產生和擦除思想，直到達到最終答案。它僅維護生成未來思想所需的最小上下文，因此模型可以使用較短的工作記憶來更長，更深入地解決更艱難的任務。下圖說明了鉛筆的工作方式

How Do Models Erase Thoughts?

模型如何消除思想？

PENCIL’s erasure mechanism draws on two classical ideas. First, from rewriting rules in logic and classical automated theorem proving, which continuously apply predefined rules to simplify complex logical or arithmetic expressions into canonical forms until reaching a final answer. Second, from functional programming languages, which creates stack frames to store local variables when calling functions and releases corresponding memory when functions return, automatically discarding intermediate states that are no longer needed.

鉛筆的擦除機制借鑒了兩個古典想法。首先，從重寫邏輯和經典自動化定理證明的規則，這些規則不斷地應用預定義的規則，以簡化複雜的邏輯或算術表達式為規範形式，直到達到最終答案。其次，從功能編程語言中，在調用函數並在函數返回時釋放相應的內存時，可以創建堆棧幀來存儲本地變量，從而自動丟棄不再需要的中間狀態。

Specifically, we introduce three special tokens, called [CALL], [SEP], and [RETURN], and use the following reduction rule to implement erasure:

具體而言，我們介紹了三個特殊令牌，稱為[call]，[sep]和[返回]，並使用以下還原規則實施擦除：

where C stands for context, T stands for intermediate thoughts, and A stands for answer. Whenever the generated sequence completely matches the pattern on the left, PENCIL triggers the reduction rule, erasing thoughts and merging the answer back into the context. It is important to note that C, T and A can themselves contain special tokens, thereby supporting recursive structures similar to nested function calls — for example, C may contain another [CALL] token, indicating that a new thinking subroutine has been initiated.

C代表上下文的地方，T代表中間思想，並且是答案的代表。每當生成的序列完全匹配左側的模式時，鉛筆都會觸發減少規則，擦除思想並將答案融合到上下文中。重要的是要注意，C，T和A本身包含特殊令牌，從而支持類似於嵌套功能調用的遞歸結構 - 例如，C可能包含另一個[呼叫]令牌，表明已經啟動了一種新的思維子例程。

How to Use PENCIL?

如何使用鉛筆？

PENCIL’s erasure mechanism flexibly supports various reasoning patterns, such as:

鉛筆的擦除機制靈活地支持各種推理模式，例如：

1️⃣ Task Decomposition: Using [CALL] to initiate subproblems, generate intermediate results, and then use [SEP] and [RETURN] to merge outputs and erase subproblem reasoning details;

1️⃣任務分解：使用[呼叫]啟動子問題，生成中間結果，然後使用[sep]和[返回]合併輸出並刪除子問題的推理詳細信息；

2️⃣ Branch and Backtrack: Using a [CALL], [SEP], [RETURN] triplet to manage an exploration branch in a search tree, erasing invalid paths upon conflicts or failures.

2️⃣分支和回溯：使用[呼叫]，[sep]，[返回]三胞胎在搜索樹中管理探索分支，從而在衝突或失敗時消除無效的路徑。

3️⃣ Summarization / Tail Recursion: Condensing a lengthy reasoning trace into concise summary, similar to tail recursion optimization in programming:

3️⃣摘要 /尾部遞歸：將冗長的推理跡線凝結成簡潔的摘要，類似於編程中的尾部遞歸優化：

where T represents the original complex reasoning process (or a more difficult problem), and T' represents the summarized or simplified

如果T表示原始的複雜推理過程（或更困難的問題），而T'表示摘要或簡化

免責聲明:info@kdj.com

所提供的資訊並非交易建議。 kDJ.com對任何基於本文提供的資訊進行的投資不承擔任何責任。加密貨幣波動性較大，建議您充分研究後謹慎投資！

如果您認為本網站使用的內容侵犯了您的版權，請立即聯絡我們（info@kdj.com），我們將及時刪除。

2025年07月01日其他文章發表於