$87959.907984 USD

1.34%

ethereum

$2920.497338 USD

3.04%

tether

$0.999775 USD

0.00%

xrp

$2.237324 USD

8.12%

bnb

$860.243768 USD

0.90%

solana

$138.089498 USD

5.43%

usd-coin

$0.999807 USD

0.01%

tron

$0.272801 USD

-1.53%

dogecoin

$0.150904 USD

2.96%

cardano

$0.421635 USD

1.97%

hyperliquid

$32.152445 USD

2.23%

bitcoin-cash

$533.301069 USD

-1.94%

chainlink

$12.953417 USD

2.68%

unus-sed-leo

$9.535951 USD

0.73%

zcash

$521.483386 USD

-2.87%

암호화폐 뉴스 기사

작은 모델, 재귀적, 기계 학습: 적은 것이 더 많은가?

2025/10/20 02:31

TRM과 이것이 효율성과 성능에 미치는 영향에 초점을 맞춰 기계 학습에서 작고 반복적인 모델의 추세를 살펴봅니다.

The world of machine learning is constantly evolving, with researchers always seeking ways to improve performance and efficiency. Lately, there's been buzz around 'Tiny model, recursive, machine learning' approaches. Let's dive into what's shaking in this field.

기계 학습의 세계는 끊임없이 진화하고 있으며, 연구자들은 항상 성능과 효율성을 향상시킬 수 있는 방법을 모색하고 있습니다. 최근에는 '작은 모델, 재귀적, 기계 학습' 접근 방식에 대한 소문이 돌고 있습니다. 이 분야에서 무엇이 흔들리고 있는지 살펴보겠습니다.

The Rise of Tiny Recursive Models

작은 재귀 모델의 부상

The recent work on TRM (Tiny Recursive Model) is questioning the necessity of complexity. TRM contains 5M-19M parameters, versus 27M in HRM. These models represent a fascinating shift towards simplicity and efficiency, challenging the conventional wisdom that bigger is always better.

TRM(Tiny Recursive Model)에 대한 최근 연구에서는 복잡성의 필요성에 대해 의문을 제기하고 있습니다. TRM에는 5M-19M 매개변수가 포함되어 있는 반면 HRM에는 27M이 포함되어 있습니다. 이러한 모델은 단순성과 효율성을 향한 매혹적인 변화를 나타내며, 클수록 항상 좋다는 기존 통념에 도전합니다.

TRM: A Closer Look

TRM: 자세히 살펴보기

TRM simplifies the recursive process, designed with one small network, which is essentially a standard transformer block: [self-attention, norm, MLP, norm]. The model is designed so that there’s one small network, which is essentially a standard transformer block: [self-attention, norm, MLP, norm]. In the original idea, there were 4 such blocks (but after experiments they came to 2).

TRM은 본질적으로 표준 변환기 블록인 [self-attention, Norm, MLP, Norm]인 하나의 작은 네트워크로 설계된 재귀 프로세스를 단순화합니다. 이 모델은 본질적으로 표준 변환기 블록인 [self-attention, 규범, MLP, 규범]인 하나의 작은 네트워크가 있도록 설계되었습니다. 원래 아이디어에는 그러한 블록이 4개 있었습니다(그러나 실험 후에는 2개가 되었습니다).

At the input, it has three elements: input (x), latent (z), and prediction (y); they’re all summed into one value. The basic iteration, analogous to the L module in HRM, generates a latent value (z, also denoted in the recursion formula as z_L) at the layer output, and the updated z goes back to the module input, where it now adds to input (x) not as zero. The output-prediction (y, also denoted in the formula as z_H) is also added, but since it hasn’t been updated, it doesn’t change anything.

입력에는 입력(x), 잠재(z), 예측(y)의 세 가지 요소가 있습니다. 그것들은 모두 하나의 값으로 합산됩니다. HRM의 L 모듈과 유사한 기본 반복은 레이어 출력에서 잠재 값(z, 재귀 공식에서 z_L로도 표시됨)을 생성하고 업데이트된 z는 모듈 입력으로 돌아가서 이제 0이 아닌 입력(x)에 추가됩니다. 출력 예측(y, 수식에서 z_H로도 표시됨)도 추가되지만 업데이트되지 않았으므로 아무것도 변경되지 않습니다.

Key Insights and Performance

주요 통찰력 및 성과

TRM achieves higher numbers than HRM: 74.7%/87.4% (attention version/MLP version) versus 55% for Sudoku, 85.3% (attention version, MLP version gives 0) versus 74.5% for Maze, 44.6%/29.6% (attn/MLP) versus 40.3% for ARC-AGI-1 and 7.8%/2.4% (attn/MLP) versus 5.0% for ARC-AGI-2. The experiments don’t look very expensive; runtime from <24 hours to about three days maximum on 4*H100 according to the repo.

TRM은 HRM보다 더 높은 수치를 달성합니다. 74.7%/87.4%(주의 버전/MLP 버전) 대 Sudoku의 경우 55%, 85.3%(주의 버전, MLP 버전은 0 제공) 대 Maze의 경우 74.5%, 44.6%/29.6%(attn/MLP) 대 ARC-AGI-1의 경우 40.3% 및 7.8%/2.4% (attn/MLP) 대 5.0% ARC-AGI-2. 실험 비용은 그리 비싸지 않은 것 같습니다. 레포에 따르면 4*H100에서 런타임은 24시간 미만에서 최대 3일 정도입니다.

My Two Cents

내 2센트

While the theoretical underpinnings of why these recursions work so well might not be fully understood yet, the empirical results are hard to ignore. TRM's architectural inventiveness, as opposed to eternal model scaling, is a breath of fresh air. It would be interesting how it would be with dataset scaling.

이러한 재귀가 왜 그렇게 잘 작동하는지에 대한 이론적 토대는 아직 완전히 이해되지 않았지만 경험적 결과는 무시하기 어렵습니다. 영원한 모델 확장과 달리 TRM의 건축적 독창성은 신선한 공기의 숨결입니다. 데이터 세트 확장이 어떻게 될지 흥미로울 것입니다.

Looking Ahead

미래를 내다보며

The journey of 'Tiny model, recursive, machine learning' is just beginning. There's a lot more to explore. So, let's keep an eye on these tiny titans and see where they take us next. Good recursions to everyone!

'작은 모델, 재귀, 머신러닝'의 여정은 이제 막 시작되었습니다. 탐험할 것이 더 많이 있습니다. 자, 이 작은 거인들을 계속 주시하고 그들이 우리를 어디로 데려가는지 살펴봅시다. 모두에게 좋은 재귀입니다!

원본 소스：substack

부인 성명:info@kdj.com

제공된 정보는 거래 조언이 아닙니다. kdj.com은 이 기사에 제공된 정보를 기반으로 이루어진 투자에 대해 어떠한 책임도 지지 않습니다. 암호화폐는 변동성이 매우 높으므로 철저한 조사 후 신중하게 투자하는 것이 좋습니다!

2026年02月09日 에 게재된 다른 기사

더