$87959.907984 USD

1.34%

ethereum

$2920.497338 USD

3.04%

tether

$0.999775 USD

0.00%

xrp

$2.237324 USD

8.12%

bnb

$860.243768 USD

0.90%

solana

$138.089498 USD

5.43%

usd-coin

$0.999807 USD

0.01%

tron

$0.272801 USD

-1.53%

dogecoin

$0.150904 USD

2.96%

cardano

$0.421635 USD

1.97%

hyperliquid

$32.152445 USD

2.23%

bitcoin-cash

$533.301069 USD

-1.94%

chainlink

$12.953417 USD

2.68%

unus-sed-leo

$9.535951 USD

0.73%

zcash

$521.483386 USD

-2.87%

암호화폐 뉴스 기사

사고의 사슬 : 추론은 언어 모델에서 나타납니다

2025/01/29 05:00

확장 된 사고 체인을 표현하도록 훈련 된 새로운 모델은 코드와 수학의 획기적인 영역을 벗어나고 있습니다.

This post is early to accommodate some last minute travel on my end!

이 게시물은 마지막 순간 여행을 수용하기 위해 일찍입니다!

The new models trained to express extended chain of thought are going to generalize outside of their breakthrough domains of code and math. The “reasoning” process of language models that we use today is chain of thought reasoning. We ask the model to work step by step because it helps it manage complexity, especially in domains where the answer requires precision across multiple specific tokens. The domains where chain of thought (CoT) is most useful today are code, mathematics, and other “reasoning” tasks1. These are the domains where models like o1, R1, Gemini-Thinking, etc. were designed for.

확장 된 사고 체인을 표현하도록 훈련 된 새로운 모델은 코드와 수학의 획기적인 영역을 벗어나고 있습니다. 오늘날 우리가 사용하는 언어 모델의“추론”프로세스는 사고의 사고입니다. 우리는 모델이 복잡성을 관리하는 데 도움이되기 때문에 모델에 단계별로 작동하도록 요청합니다. 특히 여러 특정 토큰에 대한 답변이 정밀도가 필요한 도메인에서. 오늘날 사고 체인 (COT)이 가장 유용한 영역은 코드, 수학 및 기타“추론”작업입니다. 이들은 O1, R1, Gemini-Thinking 등과 같은 모델이 설계된 도메인입니다.

Different intelligences reason in different ways that correspond to how they store and manipulate information. Humans compress a lifetime of experience into our spectacular, low-power brains that draw on past experience almost magically. The words that follow in this blog are also autoregressive, like the output of a language model, but draw on hours and hours of background processing as I converge on this argument.

정보를 저장하고 조작하는 방식에 해당하는 다른 방식으로 다른 지능을 이유. 인간은 과거의 경험을 거의 마술처럼 이끌어내는 장엄하고 저전력 뇌에 평생의 경험을 압축합니다. 이 블로그에서 따르는 단어는 언어 모델의 출력과 같이 자동 회귀이지만이 주장에 수렴 할 때 시간과 시간의 배경 처리를 그립니다.

Language models, on the other hand, are extremely general and do not today have architectures (or use-cases) that continually re-expose them to relevant problems and fold information back in a compressed form. Language models are very large, sophisticated, parametric probability distributions. All of their knowledge and information processing power is stored in the raw weights. Therein, they need a way of processing information that matches this. Chain of thought is that alignment.

반면에 언어 모델은 매우 일반적이며 오늘날 관련 문제에 대해 지속적으로 다시 노출시키고 압축 형태로 정보를 접는 아키텍처 (또는 사용 사례)가 없습니다. 언어 모델은 매우 크고 정교하며 파라 메트릭 확률 분포입니다. 모든 지식 및 정보 처리 능력은 원시 가중치에 저장됩니다. 여기에, 그들은 이것과 일치하는 정보를 처리하는 방법이 필요합니다. 생각의 사슬은 그 정렬입니다.

Chain of thought reasoning allows information to be naturally processed in smaller chunks, allowing the large, brute force probability distribution to work one token at a time. Chain of thought, while allowing more compute per important token, also allows the models to store intermediate information in their context window without needing explicit recurrence.

사고 추론은 더 작은 청크로 정보를 자연스럽게 처리 할 수 있도록하여 한 번에 하나의 토큰을 작동시킬 수 있습니다. 사고 체인은 중요한 토큰 당 더 많은 컴퓨팅을 허용하지만 모델은 명시 적 재발없이 컨텍스트 창에 중간 정보를 저장할 수 있습니다.

Recurrence is required for reasoning and this can either happen in the parameter or state-space. Chain of thoughts with transformers handles all of this in the state-space of the problems. The humans we look at as the most intelligent have embedded information directly in the parameters of our brains that we can draw on.

추론에는 재발이 필요하며 이는 매개 변수 또는 상태 공간에서 발생할 수 있습니다. 트랜스포머와의 사고 체인은 문제의 상태 공간 에서이 모든 것을 처리합니다. 우리가 보는 인간은 가장 똑똑한 사람이 우리가 이끌어 낼 수있는 뇌의 매개 변수에 직접 정보를 내장했습니다.

Here is the only assumption of this piece — chain of thought is a natural fit for language models to “reason” and therefore one should be optimistic about training methods that are designed to enhance it generalizing to many domains.2 By the end of 2025 we should have ample evidence of this given the pace of the technological development.

이 작품의 유일한 가정은 다음과 같습니다. 사고의 체인은 언어 모델이“이유”에 자연스럽게 적합하므로 많은 영역으로 일반화를 향상 시키도록 설계된 훈련 방법에 대해 낙관적이어야합니다. 기술 개발의 속도를 감안할 때 이것에 대한 충분한 증거가 있어야합니다.

If the analogies of types of intelligence aren’t convincing enough, a far more practical way to view the new style of training is a method that teaches the model to be better at allocating more compute to harder problems. If the skill is compute allocation, it is fundamental to the models handling a variety of tasks. Today’s reasoning models do not solve this perfectly, but they open the door for doing so precisely.

지능 유형의 유형이 충분히 설득력이 없다면, 새로운 스타일의 교육을 보는 훨씬 더 실용적인 방법은 모델이 더 어려운 문제에 더 많은 계산을 할당하는 데 더 잘 할 수 있도록 가르치는 방법입니다. 기술이 계산 할당 인 경우 다양한 작업을 처리하는 모델의 기본입니다. 오늘날의 추론 모델은 이것을 완벽하게 해결하지는 않지만 정확하게 그렇게 할 수있는 문을 열어줍니다.

The nature of this coming generalization is not that these models are one size fits all, best in all cases: speed, intelligence, price, etc. There’s still no free lunch. A realistic outcome for reasoning heavy models in the next 0-3 years is a world where:

이 다가오는 일반화의 본질은 이러한 모델이 모든 경우에 가장 적합한 한 가지 크기가 아니라 속도, 지능, 가격 등에 가장 적합한 것은 아닙니다. 여전히 무료 점심은 없습니다. 다음 0-3 년 동안 무거운 모델을 추론하는 현실적인 결과는 다음과 같습니다.

Reasoning trained models are superhuman on tasks with verifiable domains, like those with initial progress: Code, math, etc.

추론 훈련 된 모델은 초기 진행 상황 (코드, 수학 등)과 같은 검증 가능한 도메인이있는 작업에서 초 인간적인 것입니다.

Reasoning trained models are well better in peak performance than existing autoregressive models in many domains we would not expect and are not necessarily verifiable.

추론 훈련 된 모델은 우리가 기대하지 않고 반드시 검증 할 수없는 많은 도메인의 기존 자동 회귀 모델보다 피크 성능이 우수합니다.

Reasoning trained models are still better in performance at the long-tail of tasks, but worse in cost given the high inference costs of long-context.

추론 훈련 된 모델은 여전히 긴 테일의 작업에서 성능이 우수하지만 장기 텍스트의 높은 추론 비용을 고려할 때 비용은 더 나쁩니다.

Many of the leading figures in AI have been saying for quite some time that powerful AI is going to be “spikey" when it shows up — meaning that the capabilities and improvements will vary substantially across domains — but encountering this reality is very unintuitive.

AI의 많은 주요 인물들은 꽤 오랫동안 강력한 AI가 나타날 때 강력한 AI가 "스파이크"될 것이라고 말한 것입니다. 즉, 기능과 개선은 도메인마다 실질적으로 다를 것임을 의미하지만,이 현실은 매우 직관적이지 않습니다.

Some evidence for generalization of reasoning models already exists.

추론 모델의 일반화에 대한 몇 가지 증거가 이미 존재합니다.

OpenAI has already published multiple safety-oriented research projects with their new reasoning models in Deliberative Alignment: Reasoning Enables Safer Language Models and Trading Inference-Time Compute for Adversarial Robustness. These papers show their new methods can be translated to various safety domains, i.e. model safety policies and jailbreaking. The deliberative alignment paper shows them integrating a softer reward signal into the reasoning training — having a language model check how the safety policies apply to outputs.

OpenAI는 이미 심의 정렬에서 새로운 추론 모델을 사용하여 여러 안전 지향적 연구 프로젝트를 이미 발표했습니다. 추론은 더 안전한 언어 모델과 적대적인 견고성을위한 추론 시간 계산을 가능하게합니다. 이 논문은 새로운 방법이 다양한 안전 영역, 즉 모델 안전 정책 및 탈옥으로 번역 될 수 있음을 보여줍니다. 심의 정렬 논문은 더 부드러운 보상 신호를 추론 훈련에 통합하는 것을 보여줍니다. 언어 모델은 안전 정책이 출력에 어떻게 적용되는지 확인합니다.

An unsurprising quote from the deliberative alignment release related to generalization:

일반화와 관련된 심의 정렬 릴리스에서 놀랍지 않은 인용문 :

we find that deliberative alignment enables strong generalization to out-of-distribution safety scenarios.

우리는 심의 정렬이 배포되지 않은 안전 시나리오에 대한 강력한 일반화를 가능하게한다는 것을 발견했습니다.

Safety, qualitatively, is very orthogonal to traditional reasoning problems. Safety is very subjective to the information provided and subtle context, where math and coding problems are often about many small, forward processing steps towards a final goal. More behaviors will fit in between those.

안전은 질적으로 전통적인 추론 문제와 매우 직교합니다. 안전은 제공된 정보와 미묘한 맥락에 매우 주관적이며 수학 및 코딩 문제는 종종 최종 목표를 향한 많은 작고 전진하는 단계에 관한 것입니다. 더 많은 행동은 그 사이에 적합합니다.

This generative verifier for safety is not a ground truth signal and could theoretically be subject to reward hacking, but it was avoided. Generative verifiers will be crucial to expanding this training to countless domains — they’re easy to use and largely a new development

안전을위한이 생성 검증자는 근거 진실 신호가 아니며 이론적으로 보상 해킹에 종속 될 수 있지만 피할 수있었습니다. 생성 검증자는이 교육을 수많은 도메인으로 확장하는 데 중요합니다. 사용하기 쉽고 크게 새로운 개발입니다.

원본 소스：substack

부인 성명:info@kdj.com

제공된 정보는 거래 조언이 아닙니다. kdj.com은 이 기사에 제공된 정보를 기반으로 이루어진 투자에 대해 어떠한 책임도 지지 않습니다. 암호화폐는 변동성이 매우 높으므로 철저한 조사 후 신중하게 투자하는 것이 좋습니다!

2026年06月28日 에 게재된 다른 기사

더