![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
这家科技巨头宣布了对双子座2.5 Flash的增强功能 - 现在几乎每个维度都更好,包括推理,代码和长上下文的基准
Google is moving closer to its goal of a “universal AI assistant” that can understand context, plan and take action.
Google正在更接近其“通用AI助手”的目标,该目标可以理解上下文,计划和采取行动。
Today at Google I/O, the tech giant announced enhancements to its Gemini 2.5 Flash — it’s now better across nearly every dimension, including benchmarks for reasoning, code and long context — and 2.5 Pro, including an experimental enhanced reasoning mode, ‘Deep Think,’ that allows Pro to consider multiple hypotheses before responding.
今天,在Google I/O上,这家技术巨头宣布了对其Gemini 2.5 Flash的增强功能 - 现在几乎在每个维度上都更好,包括推理,代码和长篇小说的基准,以及2.5 Pro,包括实验性增强的推理模式,“深层思考”,“深层思考”,这使Pro在响应之前可以考虑多个假设。
“This is our ultimate goal for the Gemini app: An AI that’s personal, proactive and powerful,” Demis Hassabis, CEO of Google DeepMind, said in a press pre-brief.
Google DeepMind首席执行官Demis Hassabis在新闻发布会上说:“这是Gemini App:一个个人,积极主动且有力的AI的最终目标:AI。”
‘Deep Think’ scores impressively on top benchmarks
“深思熟虑”在顶级基准上得分令人印象深刻
Google announced Gemini 2.5 Pro — what it considers its most intelligent model yet, with a one-million-token context window — in March, and released its “I/O” coding edition earlier this month (with Hassabis calling it “the best coding model we’ve ever built!”).
Google宣布了Gemini 2.5 Pro(它认为其迄今为止最聪明的模型,具有100万台上下文窗口),并于3月发布,并于本月初发布了其“ I/O”编码版(Hassabis称其为“我们有史以来最好的编码模型!”)。
“We’ve been really impressed by what people have created, from turning sketches into interactive apps to simulating entire cities,” said Hassabis.
Hassabis说:“从将草图变成交互式应用程序到模拟整个城市,人们对我们的创造给我们留下了深刻的印象。”
He noted that, based on Google’s experience with AlphaGo, AI model responses improve when they’re given more time to think. This led DeepMind scientists to develop Deep Think, which uses Google’s latest cutting-edge research in thinking and reasoning, including parallel techniques.
他指出,根据Google在Alphago的经验,AI模型的响应在有更多时间思考时会有所改善。这导致了深媒体科学家发展深思熟虑,该思想利用了Google在思维和推理方面的最新尖端研究,包括并行技术。
Deep Think has shown impressive scores on the hardest math and coding benchmarks, including the 2025 USA Mathematical Olympiad (USAMO). It also leads on LiveCodeBench, a difficult benchmark for competition-level coding, and scores 84.0% on MMMU, which tests multimodal understanding and reasoning.
Deep Think在最难的数学和编码基准上表现出令人印象深刻的分数,包括2025年美国数学奥林匹克(USAMO)。它还领导着LiveCodebench,这是竞争级编码的困难基准,在MMMU上得分为84.0%,该基准测试了多模式的理解和推理。
Hassabis added, “We’re taking a bit of extra time to conduct more frontier safety evaluations and get further input from safety experts.” (Meaning: As for now, it is available to trusted testers via the API for feedback before the capability is made widely available.)
Hassabis补充说:“我们花了一些时间来进行更多的边境安全评估,并从安全专家那里获得进一步的意见。” (意思是:到目前为止,它可以通过API受信任的测试人员进行反馈,然后才能广泛使用该功能。)
Overall, the new 2.5 Pro leads popular coding leaderboard WebDev Arena, with an ELO score — which measures the relative skill level of players in two-player games like chess — of 1420 (intermediate to proficient). It also leads across all categories of the LMArena leaderboard, which evaluates AI based on human preference.
总体而言,新的2.5 Pro领导了流行的编码排行榜WebDev竞技场,其ELO得分(衡量了Chess等两人游戏中的玩家的相对技能水平)为1420(中级至熟练)。它还领导着LMARENA排行榜的所有类别,该排行榜基于人类的喜好评估AI。
Since its launch, “we’ve been really impressed by what [users have] created, from turning sketches into interactive apps to simulating entire cities,” said Hassabis.
自发布以来,“从将草图变成交互式应用到模拟整个城市,我们对所创造的东西给我们留下了深刻的印象。”
Important updates to Gemini 2.5 Pro, Flash
Gemini 2.5 Pro的重要更新,Flash
Also today, Google announced an enhanced 2.5 Flash, considered its workhorse model designed for speed, efficiency and low cost. 2.5 Flash has been improved across the board in benchmarks for reasoning, multimodality, code and long context — Hassabis noted that it’s “second only” to 2.5 Pro on the LMArena leaderboard. The model is also more efficient, using 20 to 30% fewer tokens.
同样,今天,Google宣布了一个增强的2.5闪存,被认为是其主力型号,专为速度,效率和低成本而设计。 2.5 Flash已通过基准进行了推理,多模式,代码和延长的上下文的全面改进 - Hassabis指出,在LMARENA排行榜上,它是“仅第二个”。该模型使用的代币少20%至30%。
Google is making final adjustments to 2.5 Flash based on developer feedback; it is now available for preview in Google AI Studio, Vertex AI and in the Gemini app. It will be generally available for production in early June.
Google正在根据开发人员反馈对2.5 Flash进行最终调整;现在可以在Google AI Studio,Vertex AI和Gemini应用程序中进行预览。通常将在6月初进行生产。
Google is bringing additional capabilities to both Gemini 2.5 Pro and 2.5 Flash, including native audio output to create more natural conversational experiences, text-to-speech to support multiple speakers, thought summaries and thinking budgets.
Google正在为Gemini 2.5 Pro和2.5 Flash带来其他功能,包括本机音频输出,以创造更多自然的对话体验,文本到语音以支持多个扬声器,思想摘要和思维预算。
With native audio input (in preview), users can steer Gemini’s tone, accent and style of speaking (think: directing the model to be melodramatic or maudlin when telling a story). Like Project Mariner, the model is also equipped with tool use, allowing it to search on users’ behalf.
使用本机音频输入(在预览中),用户可以引导双子座的语气,口音和说话风格(思考:在讲故事时,将模型引导为旋律或Maudlin)。像Project Mariner一样,该模型还配备了工具使用,使其可以代表用户搜索。
Other experimental early voice features include affective dialogue, which gives the model the ability to detect emotion in user voice and respond accordingly; proactive audio that allows it to tune out background conversations; and thinking in the Live API to support more complex tasks.
其他实验性的早期语音特征包括情感对话,该对话使模型能够在用户语音中检测情绪并做出相应响应。主动音频,使其可以调整背景对话;并在实时API中思考以支持更复杂的任务。
New multiple-speaker features in both Pro and Flash support more than 24 languages, and the models can quickly switch from one dialect to another. “Text-to-speech is expressive and can capture subtle nuances, such as whispers,” Koray Kavukcuoglu, CTO of Google DeepMind, and Tulsee Doshi, senior director for product management at Google DeepMind, wrote in a blog posted today.
Pro和Flash支持24多种语言中的新的多扬声器功能,模型可以快速从一个方言转换为另一种方言。 Google DeepMind的CTO Koray Kavukcuoglu和Google DeepMind产品管理高级总监Tulsee Doshi在今天在一个博客中写道:“文本到语音具有表现力,可以捕捉细微的细微差别,例如耳语。”
Further, 2.5 Pro and Flash now include thought summaries in the Gemini API and Vertex AI. These “take the model’s raw thoughts and organize them into a clear format with headers, key details, and information about model actions, like when they use tools,” Kavukcuoglu and Doshi explain. The goal is to provide a more structured, streamlined format for the model’s thinking
此外,2.5 Pro和Flash现在包括双子API和Vertex AI中的思想摘要。这些“采用模型的原始思想,并将其整理成一个清晰的格式,其中包括标题,关键细节以及有关模型操作的信息,例如使用工具,” Kavukcuoglu和Doshi解释说。目的是为模型的思维提供更结构化的简化格式
免责声明:info@kdj.com
所提供的信息并非交易建议。根据本文提供的信息进行的任何投资,kdj.com不承担任何责任。加密货币具有高波动性,强烈建议您深入研究后,谨慎投资!
如您认为本网站上使用的内容侵犯了您的版权,请立即联系我们(info@kdj.com),我们将及时删除。