NVIDIA has recently made a significant stride in enhancing AI performance with a threefold boost across its GeForce RTX GPU lineup, RTX PCs, and workstations, all thanks to the latest driver update. With the unveiling of these updates during the Microsoft Build conference, NVIDIA has positioned its RTX platform, comprising GeForce RTX GPUs, workstations, and PCs, at the pinnacle of AI performance across various market segments.
The latest improvements focus prominently on Large Language Models (LLMs), which are at the core of state-of-the-art Generative AI experiences. NVIDIA’s new R555 driver empowers RTX GPUs and AI-oriented PC platforms to achieve up to three times faster AI performance when working with ONNX Runtime (ORT) and DirectML. These are essential tools for executing AI models on Windows PCs, now with increased efficiency.
WebNN acceleration through DirectML is another key enhancement, allowing web developers to seamlessly deploy new AI models with RTX proficiency. Collaborative efforts between Microsoft and NVIDIA are paving the way for heightened RTX GPU performance and integration of DirectML support in PyTorch.
Here are the comprehensive novel capabilities introduced with the R555 driver for NVIDIA’s GeForce RTX GPUs and RTX PCs:
– Innovative support for DQ-GEMM metacommand, facilitating INT4 weight-only quantization tailored for LLMs.
– Introduction of RMSNorm normalization methods compatible with models such as Llama 2, Llama 3, Mistral, and Phi-3.
– Inclusion of group and multi-query attention mechanisms, coupled with sliding window attention for enhanced Mistral support.
– Improved in-place KV updates optimizing attention performance.
– Enhanced support for GEMM involving non-multiple-of-8 tensors, to better context phase performance.
NVIDIA’s benchmark results showcase substantial gains for generative AI leveraging the ORT, a Microsoft-provided AI extension, particularly with INT4 and FP16 data types. These gains, which can go up to three times the initial performance, have been attributed to the new optimization methods for various LLMs, including models like Phi-3, Llama 3, Gemma, and Mistral.
Beyond these advancements, NVIDIA remains a frontrunner in the consumer AI PC realm, backed by its powerful TensorRT and TensorRT-LLM software suite. The brand’s AI hardware, embodied in its GPUs with Tensor Cores, supports an array of groundbreaking technologies. Some of the standout solutions include DLSS Super Resolution, NVIDIA ACE, RTX Remix, Omniverse, Broadcast, and RTX Video.
NVIDIA’s GPUs boast a staggering 1300 TOPS of AI compute capability, far outpacing the anticipated performance of the year’s most powerful chips, which hover around the 100 TOPS mark. With upcoming PCs equipped with the latest NVIDIA RTX GPUs, the RTX AI PC platform is set to catalyze progress within the consumer AI segment, charting new horizons for cutting-edge AI experiences.






