Market Cap: $3.9038T 0.93%
Volume(24h): $156.0044B -1.37%
  • Market Cap: $3.9038T 0.93%
  • Volume(24h): $156.0044B -1.37%
  • Fear & Greed Index:
  • Market Cap: $3.9038T 0.93%
Cryptos
Topics
Cryptospedia
News
CryptosTopics
Videos
Top News
Cryptos
Topics
Cryptospedia
News
CryptosTopics
Videos
bitcoin
bitcoin

$113137.862908 USD

0.65%

ethereum
ethereum

$4107.436072 USD

-1.96%

xrp
xrp

$2.908808 USD

2.59%

tether
tether

$1.000294 USD

0.01%

bnb
bnb

$1010.914842 USD

-1.12%

solana
solana

$210.653310 USD

-2.16%

usd-coin
usd-coin

$0.999776 USD

-0.01%

dogecoin
dogecoin

$0.239360 USD

-0.04%

tron
tron

$0.337849 USD

0.37%

cardano
cardano

$0.807698 USD

-0.61%

hyperliquid
hyperliquid

$45.387447 USD

0.61%

chainlink
chainlink

$21.408287 USD

-0.92%

ethena-usde
ethena-usde

$1.000509 USD

-0.04%

avalanche
avalanche

$32.634682 USD

-4.77%

sui
sui

$3.349772 USD

-0.19%

Cryptocurrency News Articles

Apple and NVIDIA Collaborate to Implement Faster Text Generation Performance With Large Language Models

Dec 19, 2024 at 05:33 am

In a blog post today, Apple engineers have shared new details on a collaboration with NVIDIA to implement faster text generation performance with large language models.

Apple and NVIDIA Collaborate to Implement Faster Text Generation Performance With Large Language Models

Apple engineers have shared new details on a collaboration with NVIDIA to implement faster text generation performance with large language models (LLMs).

Earlier this year, Apple published and open sourced its Recurrent Drafter (ReDrafter) technique, a new method for generating text with LLMs that’s significantly faster and “achieves state of the art performance.” It combines two techniques: beam search (to explore multiple possibilities) and dynamic tree attention (to efficiently handle choices).

While its research demonstrated strong results, Apple also collaborated with NVIDIA to apply ReDrafter in production. As part of this collaboration, ReDrafter was integrated into NVIDIA TensorRT-LLM, a tool that helps run LLMs faster on NVIDIA GPUs.

Here are the results:

To enable the integration of ReDrafter, NVIDIA added new operators or exposed existing ones, which considerably improved TensorRT-LLM’s capability to accommodate sophisticated models and decoding methods. ML developers using NVIDIA GPUs can now easily benefit from ReDrafter’s accelerated token generation for their production LLM applications with TensorRT-LLM.

In benchmarking a tens-of-billions parameter production model on NVIDIA GPUs, using the NVIDIA TensorRT-LLM inference acceleration framework with ReDrafter, we have seen 2.7x speed-up in generated tokens per second for greedy decoding. These benchmark results indicate this tech could significantly reduce latency users may experience, while also using fewer GPUs and consuming less power.

“LLMs are increasingly being used to power production applications, and improving inference efficiency can both impact computational costs and reduce latency for users,” Apple’s machine learning researchers conclude. “With ReDrafter’s novel approach to speculative decoding integrated into the NVIDIA TensorRT-LLM framework, developers can now benefit from faster token generation on NVIDIA GPUs for their production LLM applications.”

You can learn more about this work on Apple’s website and in a blog post on NVIDIA’s website.

Original source:9to5mac

Disclaimer:info@kdj.com

The information provided is not trading advice. kdj.com does not assume any responsibility for any investments made based on the information provided in this article. Cryptocurrencies are highly volatile and it is highly recommended that you invest with caution after thorough research!

If you believe that the content used on this website infringes your copyright, please contact us immediately (info@kdj.com) and we will delete it promptly.

Other articles published on Sep 25, 2025