$114759.887876 USD

1.15%

ethereum

$3567.012478 USD

3.57%

xrp

$3.011311 USD

6.59%

tether

$1.000079 USD

0.03%

bnb

$755.879920 USD

1.64%

solana

$164.217689 USD

2.21%

usd-coin

$0.999865 USD

-0.01%

tron

$0.327478 USD

1.21%

dogecoin

$0.202566 USD

3.38%

cardano

$0.738623 USD

3.60%

hyperliquid

$38.685825 USD

3.16%

stellar

$0.412969 USD

10.27%

sui

$3.496145 USD

2.58%

chainlink

$16.602360 USD

4.54%

bitcoin-cash

$550.336635 USD

4.06%

Cryptocurrency News Articles

Thinkless: A Framework for Dynamically Choosing Between Short and Long-Form Reasoning in Language Models

May 23, 2025 at 01:59 pm

The effectiveness of language models relies on their ability to simulate human-like step-by-step deduction. However, these reasoning sequences are resource-intensive and can be wasteful for simple questions that do not require elaborate computation. This lack of awareness regarding the complexity of the task is one of the core challenges in these models. They often default to detailed reasoning even for queries that could be answered directly.

Researchers from the National University of Singapore have developed a new framework called Thinkless that enables a language model to autonomously decide whether to use short or long-form reasoning, tailoring its response to the complexity of the task at hand.

The framework, which is built on reinforcement learning, introduces two special control tokens:

* for concise answers and

* for detailed responses.

By incorporating a novel algorithm called Decoupled Group Relative Policy Optimization (DeGRPO), Thinkless separates the training focus between selecting the reasoning mode and improving the accuracy of the generated response.

This design prevents the model from falling into one-dimensional behavior and enables adaptive reasoning tailored to each query.

The methodology involves two stages: warm-up distillation and reinforcement learning. In the distillation phase, Thinkless is trained using outputs from two expert models—one specializing in short responses and the other in detailed reasoning. This stage helps the model establish a firm link between the control token and the desired reasoning format.

The reinforcement learning stage then fine-tunes the model’s ability to decide which reasoning mode to use. DeGRPO decomposes the learning into two separate objectives: one for training the control token and another for refining the response tokens.

This approach avoids the gradient imbalances in earlier models, where longer responses would overpower the learning signal, leading to a collapse in reasoning diversity. Thinkless ensures that both and tokens receive balanced updates, promoting stable learning across response types.

When evaluated, Thinkless significantly reduced long-form reasoning while preserving high accuracy. On the Minerva Algebra benchmark, the model used the token in only 25.88% of cases while achieving 94.59% accuracy. In contrast, conventional reasoning models had to use extended chains of thought much more frequently.

On the AIME 2024 dataset, Thinkless reached a 27.33% accuracy rate with 100% usage of the reasoning mode, showing that it could maintain performance when full reasoning was necessary. On the GSM8K dataset, it utilized 13.31% of the time, yet still achieved 84.18% accuracy.

These results reflect the model’s ability to handle simple and complex queries with appropriate reasoning depth, cutting down on unnecessary token generation by as much as 90% in some tasks.

This study, titled "Thinkless: Equipping Language Models for Autonomous Depth Control in Reasoning," is a valuable contribution to the field of natural language processing, presenting a practical and efficient method for optimizing large language models for diverse and complex tasks.

Original source：marktechpost

Disclaimer:info@kdj.com

The information provided is not trading advice. kdj.com does not assume any responsibility for any investments made based on the information provided in this article. Cryptocurrencies are highly volatile and it is highly recommended that you invest with caution after thorough research！

If you believe that the content used on this website infringes your copyright, please contact us immediately (info@kdj.com) and we will delete it promptly.

Other articles published on Aug 04, 2025