![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
语言模型的有效性取决于它们模拟类似人类逐步推论的能力。但是,这些推理序列是资源密集的,对于不需要精心计算的简单问题可能会浪费。缺乏对任务复杂性的认识是这些模型中的核心挑战之一。即使是可以直接回答的查询,他们也经常默认为详细的推理。
Researchers from the National University of Singapore have developed a new framework called Thinkless that enables a language model to autonomously decide whether to use short or long-form reasoning, tailoring its response to the complexity of the task at hand.
新加坡国立大学的研究人员开发了一个名为“ Themless”的新框架,该框架使语言模型能够自主决定是使用简短还是长形式的推理,从而量身定制其对手头任务复杂性的反应。
The framework, which is built on reinforcement learning, introduces two special control tokens:
该框架建立在增强学习的基础上,引入了两个特殊控制令牌:
*
*用于简洁的答案和
*
*用于详细响应。
By incorporating a novel algorithm called Decoupled Group Relative Policy Optimization (DeGRPO), Thinkless separates the training focus between selecting the reasoning mode and improving the accuracy of the generated response.
通过合并一种称为脱钩的群体相对策略优化(DEGRPO)的新型算法,可以将选择推理模式和提高生成响应的准确性之间的训练重点分开。
This design prevents the model from falling into one-dimensional behavior and enables adaptive reasoning tailored to each query.
该设计阻止了模型落入一维行为,并可以为每个查询量身定制自适应推理。
The methodology involves two stages: warm-up distillation and reinforcement learning. In the distillation phase, Thinkless is trained using outputs from two expert models—one specializing in short responses and the other in detailed reasoning. This stage helps the model establish a firm link between the control token and the desired reasoning format.
该方法涉及两个阶段:热身蒸馏和增强学习。在蒸馏阶段,使用来自两个专家模型的输出进行了训练,其中一种专门研究简短的响应,另一种是详细的推理。此阶段有助于模型在控制令牌和所需的推理格式之间建立牢固的联系。
The reinforcement learning stage then fine-tunes the model’s ability to decide which reasoning mode to use. DeGRPO decomposes the learning into two separate objectives: one for training the control token and another for refining the response tokens.
然后,增强学习阶段然后微调模型决定使用哪种推理模式的能力。 DeGrpo将学习分解为两个单独的目标:一个用于训练控件令牌,另一个用于完善响应令牌。
This approach avoids the gradient imbalances in earlier models, where longer responses would overpower the learning signal, leading to a collapse in reasoning diversity. Thinkless ensures that both
这种方法避免了早期模型中的梯度失衡,在早期模型中,较长的响应会压倒学习信号,从而导致推理多样性的崩溃。毫无疑问,可以确保两者都会获得平衡的更新,从而促进跨响应类型的稳定学习。
When evaluated, Thinkless significantly reduced long-form reasoning while preserving high accuracy. On the Minerva Algebra benchmark, the model used the
评估时,毫无疑问会显着降低长期推理,同时保持高精度。在密涅瓦代数基准测试中,该模型仅在25.88%的情况下使用令牌,同时获得94.59%的精度。相反,传统的推理模型必须更频繁地使用扩展的思想链。
On the AIME 2024 dataset, Thinkless reached a 27.33% accuracy rate with 100% usage of the reasoning mode, showing that it could maintain performance when full reasoning was necessary. On the GSM8K dataset, it utilized
在AIME 2024数据集上,毫无疑问,其准确率达到了27.33%的准确率,其推理模式使用100%,这表明当需要充分推理时,它可以保持性能。在GSM8K数据集上,它在13.31%的时间内仍达到了84.18%的准确性。
These results reflect the model’s ability to handle simple and complex queries with appropriate reasoning depth, cutting down on unnecessary token generation by as much as 90% in some tasks.
这些结果反映了该模型在适当的推理深度处理简单和复杂的查询的能力,在某些任务中将不必要的令牌生成减少多达90%。
This study, titled "Thinkless: Equipping Language Models for Autonomous Depth Control in Reasoning," is a valuable contribution to the field of natural language processing, presenting a practical and efficient method for optimizing large language models for diverse and complex tasks.
这项名为“毫无思想:为推理中自主深度控制的语言模型为语言模型装备”的研究是对自然语言处理领域的宝贵贡献,它提出了一种实用,有效的方法,可用于优化大型语言模型,以实现各种和复杂的任务。
免责声明:info@kdj.com
所提供的信息并非交易建议。根据本文提供的信息进行的任何投资,kdj.com不承担任何责任。加密货币具有高波动性,强烈建议您深入研究后,谨慎投资!
如您认为本网站上使用的内容侵犯了您的版权,请立即联系我们(info@kdj.com),我们将及时删除。
-
-
- 英国大型超市向VE Day商店推出了资深折扣
- 2025-06-08 05:15:14
- 一家超市将从5月8日星期四至5月11日星期日为退伍军人和服务人员提供折扣
-
- Bitcoin (BTC), Ethereum (ETH), XRP, and Meme Coins Under Pressure as Whales Dump Holdings
- 2025-06-08 05:10:14
- Cryptocurrency whales have recently made headlines by selling off significant portions of their holdings after a period of accumulation. This trend has particularly impacted major tokens like Bitcoin (BTC), Ethereum (ETH), XRP, and even meme coins like Shiba Inu (SHIB) and Official Trump (TRUMP). As the market reacts, analysts are closely monitoring the implications of these movements.
-
-
-
-
-
-