$114785.940846 USD

1.16%

ethereum

$3573.788526 USD

3.85%

xrp

$3.013711 USD

6.60%

tether

$1.000073 USD

0.03%

bnb

$756.388099 USD

1.68%

solana

$164.326962 USD

2.31%

usd-coin

$0.999715 USD

-0.01%

tron

$0.327508 USD

1.24%

dogecoin

$0.202611 USD

3.35%

cardano

$0.739849 USD

3.73%

hyperliquid

$38.725434 USD

3.02%

stellar

$0.412791 USD

10.10%

sui

$3.499031 USD

2.58%

chainlink

$16.619697 USD

4.60%

bitcoin-cash

$552.204567 USD

4.30%

암호화폐 뉴스 기사

생각없는 : 언어 모델에서 짧은 형태와 장기적인 추론을 동적으로 선택하기위한 프레임 워크

2025/05/23 13:59

언어 모델의 효과는 인간과 같은 단계별 공제를 시뮬레이션하는 능력에 의존합니다. 그러나 이러한 추론 시퀀스는 자원 집약적이며 정교한 계산이 필요하지 않은 간단한 질문에 낭비 될 수 있습니다. 작업의 복잡성에 대한 인식 부족은 이러한 모델의 핵심 과제 중 하나입니다. 그들은 종종 직접 대답 할 수있는 쿼리에 대해서도 상세한 추론에 기본값을 제공합니다.

Researchers from the National University of Singapore have developed a new framework called Thinkless that enables a language model to autonomously decide whether to use short or long-form reasoning, tailoring its response to the complexity of the task at hand.

싱가포르 국립 대학교 (National University of Singapore)의 연구원들은 언어 모델이 짧거나 장기적인 추론을 사용할지 여부를 자율적으로 결정하여 당면한 과제의 복잡성에 대한 응답을 자율적으로 결정할 수있는 ThinkLess라는 새로운 프레임 워크를 개발했습니다.

The framework, which is built on reinforcement learning, introduces two special control tokens:

강화 학습을 기반으로하는 프레임 워크는 두 가지 특별 제어 토큰을 소개합니다.

* for concise answers and

* 간결한 답변 및

* for detailed responses.

* 자세한 응답.

By incorporating a novel algorithm called Decoupled Group Relative Policy Optimization (DeGRPO), Thinkless separates the training focus between selecting the reasoning mode and improving the accuracy of the generated response.

Degrpo (Decoupped Group 상대 정책 최적화)라는 새로운 알고리즘을 통합함으로써 Thinkless는 추론 모드를 선택하고 생성 된 응답의 정확도를 향상시키는 것 사이의 교육 초점을 분리합니다.

This design prevents the model from falling into one-dimensional behavior and enables adaptive reasoning tailored to each query.

이 설계는 모델이 1 차원 동작으로 떨어지는 것을 방지하고 각 쿼리에 맞게 조정 된 적응성 추론을 가능하게합니다.

The methodology involves two stages: warm-up distillation and reinforcement learning. In the distillation phase, Thinkless is trained using outputs from two expert models—one specializing in short responses and the other in detailed reasoning. This stage helps the model establish a firm link between the control token and the desired reasoning format.

이 방법론에는 워밍업 증류 및 강화 학습의 두 단계가 포함됩니다. 증류 단계에서 Thinkless는 두 가지 전문가 모델의 출력을 사용하여 훈련을받습니다. 하나는 짧은 응답을 전문으로하고 다른 하나는 상세한 추론을합니다. 이 단계는 모델이 제어 토큰과 원하는 추론 형식 사이의 확고한 연결을 설정하는 데 도움이됩니다.

The reinforcement learning stage then fine-tunes the model’s ability to decide which reasoning mode to use. DeGRPO decomposes the learning into two separate objectives: one for training the control token and another for refining the response tokens.

강화 학습 단계는 사용 할 추론 모드를 결정하는 모델의 능력을 미세 조정합니다. Degrpo는 학습을 두 가지 개별 목표로 분해합니다. 하나는 제어 토큰을 훈련하고 다른 하나는 응답 토큰을 정제하기위한 것입니다.

This approach avoids the gradient imbalances in earlier models, where longer responses would overpower the learning signal, leading to a collapse in reasoning diversity. Thinkless ensures that both and tokens receive balanced updates, promoting stable learning across response types.

이 접근법은 더 긴 반응이 학습 신호를 압도하여 추론 다양성의 붕괴로 이어지는 초기 모델의 구배 불균형을 피합니다. Thinkless는 두 가지와 토큰이 균형 잡힌 업데이트를 받고 응답 유형에 걸쳐 안정적인 학습을 촉진하도록합니다.

When evaluated, Thinkless significantly reduced long-form reasoning while preserving high accuracy. On the Minerva Algebra benchmark, the model used the token in only 25.88% of cases while achieving 94.59% accuracy. In contrast, conventional reasoning models had to use extended chains of thought much more frequently.

평가할 때, 생각 없이는 높은 정확도를 유지하면서 장기적인 추론을 크게 줄였습니다. Minerva Algebra 벤치 마크 에서이 모델은 94.59%의 정확도를 달성하면서 25.88%의 사례에서 토큰을 사용했습니다. 대조적으로, 기존의 추론 모델은 확장 된 사고 체인을 훨씬 더 자주 사용해야했다.

On the AIME 2024 dataset, Thinkless reached a 27.33% accuracy rate with 100% usage of the reasoning mode, showing that it could maintain performance when full reasoning was necessary. On the GSM8K dataset, it utilized 13.31% of the time, yet still achieved 84.18% accuracy.

AIME 2024 데이터 세트에서 Thinkless는 추론 모드를 100% 사용하여 27.33% 정확도에 도달하여 전체 추론이 필요할 때 성능을 유지할 수 있음을 보여줍니다. GSM8K 데이터 세트에서는 시간의 13.31%를 사용했지만 여전히 84.18% 정확도를 달성했습니다.

These results reflect the model’s ability to handle simple and complex queries with appropriate reasoning depth, cutting down on unnecessary token generation by as much as 90% in some tasks.

이 결과는 적절한 추론 깊이로 단순하고 복잡한 쿼리를 처리하는 모델의 능력을 반영하여 일부 작업에서 불필요한 토큰 생성을 90% 정도 줄입니다.

This study, titled "Thinkless: Equipping Language Models for Autonomous Depth Control in Reasoning," is a valuable contribution to the field of natural language processing, presenting a practical and efficient method for optimizing large language models for diverse and complex tasks.

이 연구는 "Thinkless : 자율적 깊이 제어를위한 언어 모델 장비"라는 제목 의이 연구는 자연 언어 처리 분야에 귀중한 기여로 다양하고 복잡한 작업을위한 대형 언어 모델을 최적화하기위한 실용적이고 효율적인 방법을 제시합니다.

원본 소스：marktechpost

부인 성명:info@kdj.com

제공된 정보는 거래 조언이 아닙니다. kdj.com은 이 기사에 제공된 정보를 기반으로 이루어진 투자에 대해 어떠한 책임도 지지 않습니다. 암호화폐는 변동성이 매우 높으므로 철저한 조사 후 신중하게 투자하는 것이 좋습니다!

2025年08月04日 에 게재된 다른 기사

더