$101937.247657 USD

-1.92%

ethereum

$2440.088811 USD

-3.10%

tether

$1.000193 USD

0.01%

xrp

$2.459614 USD

3.05%

bnb

$645.663399 USD

-1.18%

solana

$169.340061 USD

-2.43%

usd-coin

$1.000185 USD

0.04%

dogecoin

$0.221860 USD

-5.74%

cardano

$0.788860 USD

-2.57%

tron

$0.263711 USD

-1.20%

sui

$3.873057 USD

-2.82%

chainlink

$16.315579 USD

-4.09%

avalanche

$23.848565 USD

-4.36%

stellar

$0.301245 USD

-3.23%

shiba-inu

$0.000015 USD

-6.14%

加密貨幣新聞文章

蘋果的新視覺模型很快

2025/05/13 05:59

在過去的幾個月中，有很多關於蘋果釋放AI-Subable可穿戴設備的計劃的謠言和報導。目前，看來蘋果的直接競爭對手將在2027年左右與Airpods一起推出，並帶有相機

Apple has been busy developing its own AI technologies, and recently offered a glimpse into how its models might work.

蘋果一直在忙於開發自己的AI技術，並最近瞥見了其模型的運作方式。

Currently, Apple’s direct competitors to the Meta Ray-Bans are planned for around 2027, together with AirPods equipped with cameras, which will provide their own set of AI-enabled capabilities.

目前，計劃在2027年左右將蘋果的直接競爭對手與配備攝像機配備的Airpods左右，這將提供自己的AI-Sable功能。

While it’s still too early to anticipate what they will precisely look like, Apple unveiled MLX, its own open ML framework designed specifically for Apple Silicon.

雖然還為時過早地預見了他們的樣子，但Apple宣布了MLX，它是專門為蘋果矽設計的開放ML框架。

Essentially, MLX provides a lightweight method to train and run models directly on Apple devices, remaining familiar to developers who prefer frameworks and languages more traditionally used for AI development.

本質上，MLX提供了一種輕巧的方法，可以直接在Apple設備上訓練和運行模型，而開發人員仍然熟悉，他們更喜歡傳統上用於AI開發的框架和語言。

Apple’s visual model is blazing fast

蘋果的視覺模型正在快速燃燒

Now, Apple’s Machine Learning Research team has published FastVLM: a Visual Language Model (VLM) that leverages MLX to deliver nearly instantaneous high-resolution image processing, requiring significantly less computational power compared to similar models.

現在，Apple的機器學習研究團隊已經發布了FASTVLM：一種視覺語言模型（VLM），該模型（VLM）利用MLX提供幾乎瞬時的高分辨率圖像處理，與類似模型相比，計算能力要少得多。

As Apple explains in its report:

正如蘋果在其報告中所解釋的那樣：

Based on a comprehensive efficiency analysis of the interplay between image resolution, vision latency, token count, and LLM size, we introduce FastVLM—a model that achieves an optimized trade-off between latency, model size, and accuracy.

基於對圖像分辨率，視覺延遲，代幣計數和LLM大小之間相互作用的綜合效率分析，我們引入了FastVLM（該模型，該模型在延遲，模型大小和準確性之間實現了優化的權衡。

At the heart of FastVLM is an encoder named FastViTHD, designed specifically for efficient VLM performance on high-resolution images.

FastVLM的核心是一個名為FastVithD的編碼器，專為高分辨率圖像上有效的VLM性能而設計。

It's up to 3.2 times faster and 3.6 times smaller than comparable models. This is a significant advantage when aiming to process information directly on the device without relying on the cloud to generate a response to what the user has just asked or is looking at.

它的最大3.2倍，比可比型號小3.6倍。當目標直接在設備上處理信息而不依賴雲來生成對用戶剛剛問或正在查看的內容的響應時，這是一個重要的優勢。

Moreover, FastVLM was designed to output fewer tokens, which is crucial during inference—the step where the model interprets the data and generates a response.

此外，FastVLM旨在輸出更少的令牌，這在推斷過程中至關重要，這是模型解釋數據並生成響應的步驟。

According to Apple, its model boasts an 85 times faster time-to-first-token compared to similar models, which is the time it takes for the user to input the first prompt and receive the first token of the answer. Fewer tokens on a faster and lighter model translate to swifter processing.

根據蘋果公司的說法，與類似型號相比，它的模型具有比類似型號的時間更快的時間，這是用戶輸入第一個提示並接收答案的第一個令牌所花費的時間。更快，更輕的模型上的令牌更少，轉化為Swifter處理。

The FastVLM model is available on GitHub, and the report detailing its architecture and performance can be found on arXiv.

FastVLM模型可在GitHub上找到，該報告詳細介紹了其體系結構和性能。

免責聲明:info@kdj.com

所提供的資訊並非交易建議。 kDJ.com對任何基於本文提供的資訊進行的投資不承擔任何責任。加密貨幣波動性較大，建議您充分研究後謹慎投資！

如果您認為本網站使用的內容侵犯了您的版權，請立即聯絡我們（info@kdj.com），我們將及時刪除。

2025年05月13日其他文章發表於