Explore how NVIDIA Rubin CPX is transforming inference performance for large context AI workloads, offering unparalleled efficiency and ROI.

The AI landscape is rapidly evolving, with inference becoming the new frontier. NVIDIA's Rubin CPX GPU is designed to meet the demands of long-context AI workloads with greater efficiency and ROI.
The Rise of Long-Context AI
Modern AI models are now capable of multi-step reasoning and long-horizon context, enabling them to tackle complex tasks. Processing massive context has become increasingly critical, particularly in areas like software development and video generation. These applications demand sustained coherence and memory across millions of tokens, pushing the boundaries of current infrastructure.
NVIDIA's SMART Framework and Disaggregated Inference
To address this shift, the NVIDIA SMART framework optimizes inference across scale, performance, architecture, ROI, and the broader ecosystem. Disaggregated inference enables the context and generation phases to be processed independently, optimizing compute and memory resources. This improves throughput, reduces latency, and enhances overall resource utilization.
Introducing NVIDIA Rubin CPX
NVIDIA is introducing the Rubin CPX GPU, a purpose-built solution designed to deliver high-throughput performance for high-value, long-context inference workloads. Built with the Rubin architecture, it features 30 petaFLOPs of NVFP4 compute, 128 GB of GDDR7 memory, and 3x attention acceleration. Optimized for processing long sequences, Rubin CPX enhances throughput and responsiveness, maximizing ROI for large-scale generative AI workloads.
The NVIDIA Vera Rubin NVL144 CPX Rack
Rubin CPX works in tandem with NVIDIA Vera CPUs and Rubin GPUs for generation-phase processing, forming a complete, high-performance disaggregated serving solution. The NVIDIA Vera Rubin NVL144 CPX rack integrates 144 Rubin CPX GPUs, 144 Rubin GPUs, and 36 Vera CPUs to deliver 8 exaFLOPs of NVFP4 compute and 100 TB of high-speed memory.
Real-World Impact and ROI
At scale, the platform can deliver a 30x to 50x return on investment, translating to as much as $5B in revenue from a $100M CAPEX investment. By combining disaggregated infrastructure, acceleration, and full-stack orchestration, Vera Rubin NVL144 CPX redefines what’s possible for enterprises building the next generation of generative AI applications.
Conclusion
The NVIDIA Rubin CPX GPU and the NVIDIA Vera Rubin NVL144 CPX rack represent a new standard for full-stack AI infrastructure, creating new possibilities for workloads like advanced software coding and generative video. It's an exciting time to be in AI, and NVIDIA is leading the charge!