$114785.940846 USD

1.16%

ethereum

$3573.788526 USD

3.85%

xrp

$3.013711 USD

6.60%

tether

$1.000073 USD

0.03%

bnb

$756.388099 USD

1.68%

solana

$164.326962 USD

2.31%

usd-coin

$0.999715 USD

-0.01%

tron

$0.327508 USD

1.24%

dogecoin

$0.202611 USD

3.35%

cardano

$0.739849 USD

3.73%

hyperliquid

$38.725434 USD

3.02%

stellar

$0.412791 USD

10.10%

sui

$3.499031 USD

2.58%

chainlink

$16.619697 USD

4.60%

bitcoin-cash

$552.204567 USD

4.30%

加密貨幣新聞文章

Google正在測試Gemini 2.5 Pro的新實驗模式，該模式增加了更深的推理功能和本機音頻輸出

2025/05/21 02:27

新模式稱為“深思想”，旨在幫助模型在回答提示之前評估多個假設。根據Google的說法，它基於新的研究方法，目前正在使用有限的Gemini API用戶進行測試。

Google is introducing deeper reasoning capabilities and native audio output to its Gemini 2.5 Pro model in an experimental mode called "Deep Think."

Google以一種稱為“ Deep Think”的實驗模式，將更深的推理功能和本機音頻輸出引入其Gemini 2.5 Pro模型。

This new mode, which is still under testing with a limited group of Gemini API users, encourages the model to consider multiple hypotheses before arriving at an answer.

這種新模式仍在與有限的Gemini API用戶一起進行測試，它鼓勵該模型在獲得答案之前考慮多個假設。

The technology behind Deep Think is based on new research methods at Google AI, and the company claims that Gemini 2.5 Pro with Deep Think outperforms OpenAI's o3 model on several benchmarks.

Deep Though的技術是基於Google AI的新研究方法，該公司聲稱Gemini 2.5 Pro具有Deep Think Thinks Proforms Openai在多個基準測試上的O3模型。

These include the USAMO 2025 math test, the LiveCodeBench programming benchmark, and MMMU, a test for multimodal reasoning.

其中包括USAMO 2025數學測試，LiveCodeBench編程基準和MMMU，MMMU是多模式推理的測試。

Gemini 2.5 Flash, optimized for speed and efficiency, has also been updated with improved performance on reasoning, multimodal tasks, and code generation.

Gemini 2.5 Flash（用於速度和效率）進行了優化，還通過推理，多模式任務和代碼生成的性能提高了。

The latest version of Flash can perform the same tasks using 20 to 30 percent fewer tokens.

最新版本的Flash可以使用少20％的令牌執行相同的任務。

Both Gemini 2.5 Pro and Flash now support native text-to-speech with multiple speaker profiles. The voice output can capture subtle effects like whispers and emotional tone, and supports more than 24 languages.

Gemini 2.5 Pro和Flash現在都支持具有多個揚聲器配置文件的本地文本對語。語音輸出可以捕獲諸如耳語和情感語調之類的微妙效果，並支持24種以上的語言。

Developers can control accent, tone, and speaking style through the Live API.

開發人員可以通過現場API來控制口音，語氣和說話風格。

Two new features—"Affective Dialogue" and "Proactive Audio"—aim to make voice interactions feel more natural.

兩個新功能 - “情感對話”和“主動音頻” - 使聲音互動更自然。

Affective Dialogue allows the model to detect emotion in a user's voice and respond accordingly—whether that’s neutrally, empathetically, or in a cheerful tone.

情感對話使該模型可以在用戶的聲音中檢測情緒並相應地做出反應 - 無論是中立，善解人意的還是開朗的語氣。

Proactive Audio helps filter out background conversations, so the AI only responds when it's directly addressed. The goal is to reduce accidental interactions and make voice control more reliable.

主動的音頻有助於濾除背景對話，因此AI僅在直接解決時響應。目的是減少意外互動並使語音控制更加可靠。

Google is also bringing features from Project Mariner into the Gemini API and Vertex AI, allowing the model to control computer applications like a web browser.

Google還將項目Mariner的功能帶入Gemini API和Vertex AI，從而使模型可以控制計算機應用程序，例如Web瀏覽器。

For developers, Gemini now includes "thought summaries", a structured view of the model’s internal reasoning and the actions it takes.

對於開發人員而言，Gemini現在包括“思想摘要”，這是模型內部推理及其採取行動的結構化觀點。

To manage performance, developers can configure "thinking budgets" to limit or disable the number of tokens the model uses for reasoning.

為了管理績效，開發人員可以配置“思考預算”，以限製或禁用模型用於推理的代幣數量。

The Gemini API also now supports Anthropic's Model Context Protocol (MCP), which could make it easier to integrate with open-source tools.

Gemini API現在還支持人類的模型上下文協議（MCP），這可以使與開源工具集成變得更容易。

Google is exploring hosted MCP servers to support agent-based application development.

Google正在探索託管的MCP服務器，以支持基於代理的應用程序開發。

原始來源：the-decoder

免責聲明:info@kdj.com

所提供的資訊並非交易建議。 kDJ.com對任何基於本文提供的資訊進行的投資不承擔任何責任。加密貨幣波動性較大，建議您充分研究後謹慎投資！

如果您認為本網站使用的內容侵犯了您的版權，請立即聯絡我們（info@kdj.com），我們將及時刪除。

2025年08月05日其他文章發表於