$118291.063552 USD

-1.58%

ethereum

$3572.485044 USD

-0.90%

xrp

$3.451964 USD

-3.96%

tether

$1.000653 USD

-0.01%

bnb

$730.658918 USD

-0.23%

solana

$177.252336 USD

-1.37%

usd-coin

$0.999923 USD

0.00%

dogecoin

$0.243676 USD

5.38%

tron

$0.324483 USD

0.42%

cardano

$0.823316 USD

-3.32%

hyperliquid

$45.160071 USD

-3.12%

stellar

$0.464120 USD

-5.84%

sui

$3.763331 USD

-6.41%

chainlink

$18.071965 USD

-2.43%

hedera

$0.263594 USD

-7.20%

加密货币新闻

铅笔：记忆短的长时间想法

2025/05/13 08:26

最近的大型语言模型（LLMS）（例如OpenAI的O1/O3，DeepSeek的R1和Anthropic的Claude 3.7）表明，允许该模型在测试时进行更深入的思考可以显着增强模型的推理能力。

Recent large language models (LLMs) — such as OpenAI’s o1/o3, DeepSeek’s R1 and Anthropic’s Claude 3.7— demonstrate that allowing the model to think deeper and longer at test time can significantly enhance model’s reasoning capability. The core approach underlying their deep thinking capability is called chain-of-thought (CoT), where the model iteratively generates intermediate reasoning steps and appends them to the current context until producing the final answer.

最近的大型语言模型（LLMS）（例如OpenAI的O1/O3，DeepSeek的R1和Anthropic的Claude 3.7）表明，允许该模型在测试时更深入地思考可以显着增强模型的推理能力。他们深思熟虑能力的核心方法称为“经营链”（COT），该模型迭代地生成了中间的推理步骤，并将它们附加到当前上下文中，直到产生最终答案。

However, as tasks become increasingly complex, the steps needed to solve them grow dramatically. For instance, consider solving NP-hard problems using CoT — the reasoning trace would inevitably span exponential steps, assuming a fixed-size Transformer as the base model and P ≠ NP. This raises an important question:

但是，随着任务变得越来越复杂，解决这些任务所需的步骤急剧发展。例如，考虑使用COT解决NP硬性问题 - 推理迹线不可避免地会跨越指数步骤，假设固定尺寸变压器是基本模型和P≠NP。这提出了一个重要的问题：

Will CoT-based test-time scaling hit hard ceilings?

基于COT的测试时间缩放会撞到硬天花板吗？

Unfortunately, probably yes. Various limitations will emerge for harder tasks: (1) chains will inevitably exceed model’s context windows, (2) critical information becomes buried and nearly impossible to retrieve from numerous preceding tokens, and (3) the self-attention complexity makes generating each new token prohibitively expensive.

不幸的是，可能是的。对于更艰巨的任务，将出现各种限制：（1）链条不可避免地会超过模型的上下文窗口，（2）关键信息被掩埋，几乎不可能从前面的代币中检索，并且（3）自我注意事项复杂性使生成每个新的代币过于昂贵的新标记。

In this article, we challenge the conventional “write-only” CoT reasoning paradigm that dominates current LLM architectures, from both theoretical and practical perspectives. Furthermore, we will explore a fundamentally different reasoning approach that allows LLM to not only generate thoughts, but also erase thoughts. This capacity for thought erasure not only offers significant practical benefits in performance and efficiency, but proves fundamental for achieving optimal reasoning efficiency from a computational theory perspective.

在本文中，我们挑战了从理论和实践角度来统治当前LLM架构的传统“仅写” COT推理范式。此外，我们将探索一种根本不同的推理方法，该方法使LLM不仅可以产生思想，还可以消除思想。这种思想擦除能力不仅为绩效和效率带来了重大的实际好处，而且证明从计算理论的角度实现最佳推理效率是基本的。

This post is based on the paper C. Yang et al., “PENCIL: Long thoughts with short memory” accepted in International Conference on Machine Learning 2025, a collaboration with Nathan Srebro, David McAllester, Zhiyuan Li. Code is also available.

这篇文章基于C. Yang等人的论文，“铅笔：长时间的记忆力，记忆短”，在机器学习2025年国际会议上，与Nathan Srebro，David McAllester的合作，David McAllester，Zhiyuan li。代码也可用。

Not Everything Needs to Be Remembered

并非一切都需要记住

The idea of selectively discarding information has deep roots in computer science history, from the earliest computational models to modern systems. The classic Turing machine overwrites symbols on its tape rather than preserving every state; programming languages reclaim memory through stack frames that are automatically released when functions complete their execution; and modern garbage collectors continuously identify and remove objects no longer accessible to the program. These mechanisms weren’t merely efficiency optimizations — they were essential design choices that made complex computation possible within finite resources.

从最早的计算模型到现代系统，选择性丢弃信息的想法在计算机科学历史上具有深厚的根源。经典的Turing机器在其磁带上覆盖符号，而不是保存每个状态。编程语言通过堆栈框架回收内存，这些堆栈框架在函数完成执行时会自动释放；现代垃圾收集器不再识别和删除该程序不再访问的对象。这些机制不仅仅是效率优化 - 它们是必不可少的设计选择，使得在有限资源中可以进行复杂的计算。

This idea also applies to human reasoning. In theorem proving, once a lemma is established, we discard its detailed derivation while preserving the result; when exploring problem-solving approaches, we simply mark unproductive paths as “failed” without retaining their full traces. Throughout complex reasoning, we naturally compress information, retaining conclusions while discarding the scaffolding used to reach them.

这个想法也适用于人类的推理。在定理证明，一旦建立了引理，我们在保留结果的同时放弃了其详细的推导。在探索解决问题的方法时，我们只是将非生产性的路径标记为“失败”，而不会保留其完整的痕迹。在整个复杂的推理中，我们自然会压缩信息，保留结论，同时丢弃用于达到这些信息的脚手架。

✏️ PENCIL: A New Reasoning Paradigm

✏️铅笔：新的推理范式

Therefore, we propose ✏️ PENCIL, a new reasoning paradigm for LLMs. Unlike ✒️ CoT that only generates thoughts, PENCIL recursively generates and erases thoughts until reaching the final answer. It maintains only the minimal context required for generating future thoughts, so the model can think longer and deeper to solve harder tasks using shorter working memory. The following figure illustrates how PENCIL works

因此，我们提出了✏️铅笔，这是LLM的新推理范式。与仅产生思想的✒️COT不同，铅笔递归产生和擦除思想，直到达到最终答案。它仅维护生成未来思想所需的最小上下文，因此模型可以使用较短的工作记忆来更长，更深入地解决更艰难的任务。下图说明了铅笔的工作方式

How Do Models Erase Thoughts?

模型如何消除思想？

PENCIL’s erasure mechanism draws on two classical ideas. First, from rewriting rules in logic and classical automated theorem proving, which continuously apply predefined rules to simplify complex logical or arithmetic expressions into canonical forms until reaching a final answer. Second, from functional programming languages, which creates stack frames to store local variables when calling functions and releases corresponding memory when functions return, automatically discarding intermediate states that are no longer needed.

铅笔的擦除机制借鉴了两个古典想法。首先，从重写逻辑和经典自动化定理证明的规则，这些规则不断地应用预定义的规则，以简化复杂的逻辑或算术表达式为规范形式，直到达到最终答案。其次，从功能编程语言中，在调用函数并在函数返回时释放相应的内存时，可以创建堆栈帧来存储本地变量，从而自动丢弃不再需要的中间状态。

Specifically, we introduce three special tokens, called [CALL], [SEP], and [RETURN], and use the following reduction rule to implement erasure:

具体而言，我们介绍了三个特殊令牌，称为[call]，[sep]和[返回]，并使用以下还原规则实施擦除：

where C stands for context, T stands for intermediate thoughts, and A stands for answer. Whenever the generated sequence completely matches the pattern on the left, PENCIL triggers the reduction rule, erasing thoughts and merging the answer back into the context. It is important to note that C, T and A can themselves contain special tokens, thereby supporting recursive structures similar to nested function calls — for example, C may contain another [CALL] token, indicating that a new thinking subroutine has been initiated.

C代表上下文的地方，T代表中间思想，并且是答案的代表。每当生成的序列完全匹配左侧的模式时，铅笔都会触发减少规则，擦除思想并将答案融合到上下文中。重要的是要注意，C，T和A本身包含特殊令牌，从而支持类似于嵌套功能调用的递归结构 - 例如，C可能包含另一个[呼叫]令牌，表明已经启动了一种新的思维子例程。

How to Use PENCIL?

如何使用铅笔？

PENCIL’s erasure mechanism flexibly supports various reasoning patterns, such as:

铅笔的擦除机制灵活地支持各种推理模式，例如：

1️⃣ Task Decomposition: Using [CALL] to initiate subproblems, generate intermediate results, and then use [SEP] and [RETURN] to merge outputs and erase subproblem reasoning details;

1️⃣任务分解：使用[呼叫]启动子问题，生成中间结果，然后使用[sep]和[返回]合并输出并删除子问题的推理详细信息；

2️⃣ Branch and Backtrack: Using a [CALL], [SEP], [RETURN] triplet to manage an exploration branch in a search tree, erasing invalid paths upon conflicts or failures.

2️⃣分支和回溯：使用[呼叫]，[sep]，[返回]三胞胎在搜索树中管理探索分支，从而在冲突或失败时消除无效的路径。

3️⃣ Summarization / Tail Recursion: Condensing a lengthy reasoning trace into concise summary, similar to tail recursion optimization in programming:

3️⃣摘要 /尾部递归：将冗长的推理迹线凝结成简洁的摘要，类似于编程中的尾部递归优化：

where T represents the original complex reasoning process (or a more difficult problem), and T' represents the summarized or simplified

如果T表示原始的复杂推理过程（或更困难的问题），而T'表示摘要或简化

免责声明:info@kdj.com

所提供的信息并非交易建议。根据本文提供的信息进行的任何投资，kdj.com不承担任何责任。加密货币具有高波动性，强烈建议您深入研究后，谨慎投资！

如您认为本网站上使用的内容侵犯了您的版权，请立即联系我们（info@kdj.com），我们将及时删除。

2025年07月19日发表的其他文章