市值: $3.2495T 2.580%
體積(24小時): $110.7413B -18.530%
  • 市值: $3.2495T 2.580%
  • 體積(24小時): $110.7413B -18.530%
  • 恐懼與貪婪指數:
  • 市值: $3.2495T 2.580%
加密
主題
加密植物
資訊
加密術
影片
頭號新聞
加密
主題
加密植物
資訊
加密術
影片
bitcoin
bitcoin

$104654.464793 USD

2.47%

ethereum
ethereum

$2482.196122 USD

1.96%

tether
tether

$1.000892 USD

0.06%

xrp
xrp

$2.172204 USD

3.01%

bnb
bnb

$645.665986 USD

1.55%

solana
solana

$148.547704 USD

1.62%

usd-coin
usd-coin

$0.999890 USD

0.00%

dogecoin
dogecoin

$0.181008 USD

5.22%

tron
tron

$0.278244 USD

0.72%

cardano
cardano

$0.658362 USD

4.58%

hyperliquid
hyperliquid

$33.402451 USD

-1.57%

sui
sui

$3.243792 USD

9.23%

chainlink
chainlink

$13.703476 USD

4.93%

avalanche
avalanche

$19.876159 USD

5.04%

unus-sed-leo
unus-sed-leo

$8.988912 USD

2.86%

加密貨幣新聞文章

毫無疑問:在語言模型中在短形和長形的推理之間動態選擇的框架

2025/05/23 13:59

語言模型的有效性取決於它們模擬類似人類逐步推論的能力。但是,這些推理序列是資源密集的,對於不需要精心計算的簡單問題可能會浪費。缺乏對任務複雜性的認識是這些模型中的核心挑戰之一。即使是可以直接回答的查詢,他們也經常默認為詳細的推理。

Researchers from the National University of Singapore have developed a new framework called Thinkless that enables a language model to autonomously decide whether to use short or long-form reasoning, tailoring its response to the complexity of the task at hand.

新加坡國立大學的研究人員開發了一個名為“ Themless”的新框架,該框架使語言模型能夠自主決定是使用簡短還是長形式的推理,從而量身定制其對手頭任務複雜性的反應。

The framework, which is built on reinforcement learning, introduces two special control tokens:

該框架建立在增強學習的基礎上,引入了兩個特殊控制令牌:

* for concise answers and

*用於簡潔的答案和

* for detailed responses.

*用於詳細響應。

By incorporating a novel algorithm called Decoupled Group Relative Policy Optimization (DeGRPO), Thinkless separates the training focus between selecting the reasoning mode and improving the accuracy of the generated response.

通過合併一種稱為脫鉤的群體相對策略優化(DEGRPO)的新型算法,可以將選擇推理模式和提高生成響應的準確性之間的訓練重點分開。

This design prevents the model from falling into one-dimensional behavior and enables adaptive reasoning tailored to each query.

該設計阻止了模型落入一維行為,並可以為每個查詢量身定制自適應推理。

The methodology involves two stages: warm-up distillation and reinforcement learning. In the distillation phase, Thinkless is trained using outputs from two expert models—one specializing in short responses and the other in detailed reasoning. This stage helps the model establish a firm link between the control token and the desired reasoning format.

該方法涉及兩個階段:熱身蒸餾和增強學習。在蒸餾階段,使用來自兩個專家模型的輸出進行了訓練,其中一種專門研究簡短的響應,另一種是詳細的推理。此階段有助於模型在控制令牌和所需的推理格式之間建立牢固的聯繫。

The reinforcement learning stage then fine-tunes the model’s ability to decide which reasoning mode to use. DeGRPO decomposes the learning into two separate objectives: one for training the control token and another for refining the response tokens.

然後,增強學習階段然後微調模型決定使用哪種推理模式的能力。 DeGrpo將學習分解為兩個單獨的目標:一個用於訓練控件令牌,另一個用於完善響應令牌。

This approach avoids the gradient imbalances in earlier models, where longer responses would overpower the learning signal, leading to a collapse in reasoning diversity. Thinkless ensures that both and tokens receive balanced updates, promoting stable learning across response types.

這種方法避免了早期模型中的梯度失衡,在早期模型中,較長的響應會壓倒學習信號,從而導致推理多樣性的崩潰。毫無疑問,可以確保兩者都會獲得平衡的更新,從而促進跨響應類型的穩定學習。

When evaluated, Thinkless significantly reduced long-form reasoning while preserving high accuracy. On the Minerva Algebra benchmark, the model used the token in only 25.88% of cases while achieving 94.59% accuracy. In contrast, conventional reasoning models had to use extended chains of thought much more frequently.

評估時,毫無疑問會顯著降低長期推理,同時保持高精度。在密涅瓦代數基準測試中,該模型僅在25.88%的情況下使用令牌,同時獲得94.59%的精度。相反,傳統的推理模型必須更頻繁地使用擴展的思想鏈。

On the AIME 2024 dataset, Thinkless reached a 27.33% accuracy rate with 100% usage of the reasoning mode, showing that it could maintain performance when full reasoning was necessary. On the GSM8K dataset, it utilized 13.31% of the time, yet still achieved 84.18% accuracy.

在AIME 2024數據集上,毫無疑問,其準確率達到了27.33%的準確率,其推理模式使用100%,這表明當需要充分推理時,它可以保持性能。在GSM8K數據集上,它在13.31%的時間內仍達到了84.18%的準確性。

These results reflect the model’s ability to handle simple and complex queries with appropriate reasoning depth, cutting down on unnecessary token generation by as much as 90% in some tasks.

這些結果反映了該模型在適當的推理深度處理簡單和復雜的查詢的能力,在某些任務中將不必要的令牌生成減少多達90%。

This study, titled "Thinkless: Equipping Language Models for Autonomous Depth Control in Reasoning," is a valuable contribution to the field of natural language processing, presenting a practical and efficient method for optimizing large language models for diverse and complex tasks.

這項名為“毫無思想:為推理中自主深度控制的語言模型為語言模型裝備”的研究是對自然語言處理領域的寶貴貢獻,它提出了一種實用,有效的方法,可用於優化大型語言模型,以實現各種和復雜的任務。

免責聲明:info@kdj.com

所提供的資訊並非交易建議。 kDJ.com對任何基於本文提供的資訊進行的投資不承擔任何責任。加密貨幣波動性較大,建議您充分研究後謹慎投資!

如果您認為本網站使用的內容侵犯了您的版權,請立即聯絡我們(info@kdj.com),我們將及時刪除。

2025年06月08日 其他文章發表於