$93113.538616 USD

-0.11%

ethereum

$1748.590950 USD

-2.15%

tether

$1.000392 USD

0.02%

xrp

$2.177851 USD

-1.16%

bnb

$600.317897 USD

-0.84%

solana

$151.339663 USD

1.47%

usd-coin

$0.999927 USD

0.01%

dogecoin

$0.179240 USD

2.45%

cardano

$0.707230 USD

2.73%

tron

$0.243466 USD

-0.61%

sui

$3.323843 USD

10.76%

chainlink

$14.828095 USD

0.41%

avalanche

$21.905207 USD

-0.82%

stellar

$0.275988 USD

4.91%

unus-sed-leo

$9.206268 USD

0.44%

暗号通貨のニュース記事

「意味のあるマシン」は、大規模な言語モデルが単語をトークンに分解して処理する方法を視覚化します

2025/04/23 18:00

ChatGpt、Claude、Grokなどの大規模な言語モデルを使用した生成AIは、人間のようにユーザーワードに適切に対応します。ただし、大きな違いがあります

The website 'Meaning Machine' provides a visually easy-to-understand view of how large-scale language models process language, which is different from how humans process language. Generative AI using large-scale language models such as ChatGPT, Claude, and Grok respond appropriately to user words just like a human. However, there is a big difference between how large-scale language models process language and how humans process language.

ウェブサイト「意味のあるマシン」は、大規模な言語モデルが言語をどのように処理するかについての視覚的に簡単に理解しやすい見解を提供します。これは、人間が言語を処理する方法とは異なります。 ChatGpt、Claude、Grokなどの大規模な言語モデルを使用した生成AIは、人間のようにユーザーワードに適切に対応します。ただし、大規模な言語モデルが言語を処理する方法と人間が言語を処理する方法との間には、大きな違いがあります。

The website 'Meaning Machine' provides a visually easy-to-understand view of how large-scale language models process language.

ウェブサイト「意味マシン」は、大規模な言語モデルが言語をどのように処理するかについて、視覚的に簡単に理解しやすい見解を提供します。

Meaning Machine · Streamlit

意味マシン・流線

Below that, the input sentence is split into words, and each word is shown with a numeric ID when it is represented as a 'token' by the large-scale language model.

その下では、入力文は単語に分割され、各単語は大規模な言語モデルによって「トークン」として表されると数値IDで表示されます。

Joshua Hathcock, the developer of Meaning Machine, explains that large-scale language models do not process entire sentences together, but rather split words and character sets into numerical IDs called tokens and process them abstractly. For example, in the case of the GPT model on which ChatGPT is based, common words such as 'The,' 'young,' 'student,' 'didn't,' and 'submit' are often represented by a single token, but rare words are split into multiple tokens made up of combinations of subwords. The large-scale language model then identifies the grammatical role of each token and infers the subject, verb, object, etc. of the sentence. In the example sentence, the subject is 'student,' the verb is 'submit,' and the object is 'report.'

意味マシンの開発者であるJoshua Hathcockは、大規模な言語モデルが文章全体を処理するのではなく、単語とキャラクターセットをトークンと呼ばれる数値IDに分割し、それらを抽象的に処理すると説明しています。たとえば、ChatGPTが基づいているGPTモデルの場合、「The」、「Young」、「Student」などの一般的な単語は、単一のトークンで表されることがよくありますが、サブワードの組み合わせで構成される複数のトークンに分割されます。次に、大規模な言語モデルは、各トークンの文法的役割を識別し、文の主題、動詞、オブジェクトなどを識別します。例の文では、件名は「学生」であり、動詞は「送信」、オブジェクトは「レポート」です。

The large-scale language model tags each token with its part of speech (POS), maps dependencies in a sentence, and structures and represents the sentence.

大規模な言語モデルは、各トークンにスピーチ（POS）の部分でタグ付けされ、文のマップ依存関係、および構造と文を表します。

The meaning of the dependency strings is explained in the table at the bottom of the Meaning Machine page.

依存関係の文字列の意味は、意味マシンページの下部にある表で説明されています。

Each token is then converted into a list (vector) of hundreds of numbers that capture its meaning and context. The figure below shows each token in the example sentence visualized in two dimensions through dimensionality reduction.

次に、各トークンは、その意味とコンテキストをキャプチャする数百の数字のリスト（ベクトル）に変換されます。次の図は、次元の減少を通じて2次元で視覚化された例の文の各トークンを示しています。

Below that is a tree showing the dependencies of each token, which shows which tokens depend on which other tokens, and what the whole picture means.

その下には、各トークンの依存関係を示す木があります。これは、どのトークンが他のトークンと全体像の意味に依存するかを示しています。

You can navigate through the dependencies by dragging the bar at the bottom of the diagram left and right.

左右の図の下部にあるバーをドラッグすることで、依存関係をナビゲートできます。

In Meaning Machine, you can enter any sentence you like into the input form at the top of the page to see how the large-scale language model converts each word into a token and how it captures the dependencies of the entire sentence.

意味マシンでは、ページの上部にある入力フォームに好きな文を入力して、大規模な言語モデルが各単語をトークンに変換する方法と、文全体の依存関係をどのようにキャプチャするかを確認できます。

'These technical steps reveal something deeper: language models don't understand language the way humans do,' Hathcock said. 'They simulate language convincingly, but in a fundamentally different way. When you or I say 'dog,' we might recall the feel of fur, the sound of a bark, and even an emotional response. But when a large-scale language model sees the word 'dog,' it sees a vector of numbers formed by the frequency with which 'dog' appears near words like 'bark,' 'tail,' 'vet,' and so on. This is not wrong; it has statistical meaning. But this has no substance, no basis, no knowledge.' In other words, large-scale language models and humans process language fundamentally differently, and no matter how human-like a response may be, there are no beliefs or goals.

「これらの技術的なステップは、より深いものを明らかにしています。言語モデルは人間のやり方で言語を理解していません」とハスコックは言いました。「彼らは説得力を持って言語をシミュレートしますが、根本的に異なる方法で。あなたまたは私が「犬」と言うとき、私たちは毛皮の感触、樹皮の音、さらには感情的な反応さえも思い出すかもしれません。しかし、大規模な言語モデルが「犬」という言葉を見ると、「犬」が「bark」、「尾」、「獣医」などの単語の近くに現れる頻度によって形成される数字のベクトルが見えます。これは間違っていません。統計的な意味があります。しかし、これには実体も根拠も知識もありません。」言い換えれば、大規模な言語モデルと人間は基本的に異なる方法で言語を処理します。そして、どのように人間の反応があっても、信念や目標はありません。

Despite this, large-scale language models are already widely used in society, creating people's resumes, filtering content, and sometimes even determining what is valuable. Since AI is already becoming a social infrastructure, Hathcock argued that it is important to know the difference in performance and understanding of large-scale language models.

それにもかかわらず、大規模な言語モデルはすでに社会で広く使用されており、人々の履歴書を作成し、コンテンツをフィルタリングし、時には価値があるものを決定することさえあります。 AIはすでに社会インフラストラクチャになっているため、Hathcockは、大規模な言語モデルのパフォーマンスと理解の違いを知ることが重要であると主張しました。

免責事項:info@kdj.com

提供される情報は取引に関するアドバイスではありません。 kdj.com は、この記事で提供される情報に基づいて行われた投資に対して一切の責任を負いません。暗号通貨は変動性が高いため、十分な調査を行った上で慎重に投資することを強くお勧めします。

このウェブサイトで使用されているコンテンツが著作権を侵害していると思われる場合は、直ちに当社 (info@kdj.com) までご連絡ください。速やかに削除させていただきます。

2025年04月25日に掲載されたその他の記事

もっと