$101937.247657 USD

-1.92%

ethereum

$2440.088811 USD

-3.10%

tether

$1.000193 USD

0.01%

xrp

$2.459614 USD

3.05%

bnb

$645.663399 USD

-1.18%

solana

$169.340061 USD

-2.43%

usd-coin

$1.000185 USD

0.04%

dogecoin

$0.221860 USD

-5.74%

cardano

$0.788860 USD

-2.57%

tron

$0.263711 USD

-1.20%

sui

$3.873057 USD

-2.82%

chainlink

$16.315579 USD

-4.09%

avalanche

$23.848565 USD

-4.36%

stellar

$0.301245 USD

-3.23%

shiba-inu

$0.000015 USD

-6.14%

加密货币新闻

苹果的新视觉模型很快

2025/05/13 05:59

在过去的几个月中，有很多关于苹果释放AI-Subable可穿戴设备的计划的谣言和报道。目前，看来苹果的直接竞争对手将在2027年左右与Airpods一起推出，并带有相机

Apple has been busy developing its own AI technologies, and recently offered a glimpse into how its models might work.

苹果一直在忙于开发自己的AI技术，并最近瞥见了其模型的运作方式。

Currently, Apple’s direct competitors to the Meta Ray-Bans are planned for around 2027, together with AirPods equipped with cameras, which will provide their own set of AI-enabled capabilities.

目前，计划在2027年左右将苹果的直接竞争对手与配备摄像机配备的Airpods左右，这将提供自己的AI-Sable功能。

While it’s still too early to anticipate what they will precisely look like, Apple unveiled MLX, its own open ML framework designed specifically for Apple Silicon.

虽然还为时过早地预见了他们的样子，但Apple宣布了MLX，它是专门为苹果硅设计的开放ML框架。

Essentially, MLX provides a lightweight method to train and run models directly on Apple devices, remaining familiar to developers who prefer frameworks and languages more traditionally used for AI development.

本质上，MLX提供了一种轻巧的方法，可以直接在Apple设备上训练和运行模型，而开发人员仍然熟悉，他们更喜欢传统上用于AI开发的框架和语言。

Apple’s visual model is blazing fast

苹果的视觉模型正在快速燃烧

Now, Apple’s Machine Learning Research team has published FastVLM: a Visual Language Model (VLM) that leverages MLX to deliver nearly instantaneous high-resolution image processing, requiring significantly less computational power compared to similar models.

现在，Apple的机器学习研究团队已经发布了FASTVLM：一种视觉语言模型（VLM），该模型（VLM）利用MLX提供几乎瞬时的高分辨率图像处理，与类似模型相比，计算能力要少得多。

As Apple explains in its report:

正如苹果在其报告中所解释的那样：

Based on a comprehensive efficiency analysis of the interplay between image resolution, vision latency, token count, and LLM size, we introduce FastVLM—a model that achieves an optimized trade-off between latency, model size, and accuracy.

基于对图像分辨率，视觉延迟，代币计数和LLM大小之间相互作用的综合效率分析，我们引入了FastVLM（该模型，该模型在延迟，模型大小和准确性之间实现了优化的权衡。

At the heart of FastVLM is an encoder named FastViTHD, designed specifically for efficient VLM performance on high-resolution images.

FastVLM的核心是一个名为FastVithD的编码器，专为高分辨率图像上有效的VLM性能而设计。

It's up to 3.2 times faster and 3.6 times smaller than comparable models. This is a significant advantage when aiming to process information directly on the device without relying on the cloud to generate a response to what the user has just asked or is looking at.

它的最大3.2倍，比可比型号小3.6倍。当目标直接在设备上处理信息而不依赖云来生成对用户刚刚问或正在查看的内容的响应时，这是一个重要的优势。

Moreover, FastVLM was designed to output fewer tokens, which is crucial during inference—the step where the model interprets the data and generates a response.

此外，FastVLM旨在输出更少的令牌，这在推断过程中至关重要，这是模型解释数据并生成响应的步骤。

According to Apple, its model boasts an 85 times faster time-to-first-token compared to similar models, which is the time it takes for the user to input the first prompt and receive the first token of the answer. Fewer tokens on a faster and lighter model translate to swifter processing.

根据苹果公司的说法，与类似型号相比，它的模型具有比类似型号的时间更快的时间，这是用户输入第一个提示并接收答案的第一个令牌所花费的时间。更快，更轻的模型上的令牌更少，转化为Swifter处理。

The FastVLM model is available on GitHub, and the report detailing its architecture and performance can be found on arXiv.

FastVLM模型可在GitHub上找到，该报告详细介绍了其体系结构和性能。

免责声明:info@kdj.com

所提供的信息并非交易建议。根据本文提供的信息进行的任何投资，kdj.com不承担任何责任。加密货币具有高波动性，强烈建议您深入研究后，谨慎投资！

如您认为本网站上使用的内容侵犯了您的版权，请立即联系我们（info@kdj.com），我们将及时删除。

2025年05月13日发表的其他文章