Nvidia unveiled the impressive results of its new Blackwell architecture in the international MLPerf Training v5. 0 benchmark. The new chips, designed specifically for the needs of giant language models and multi-modal AI systems, delivered up to 2.5-fold performance gains over the previous generation.
As part of the tests, Blackwell was the only architecture that successfully passed all MLPerf categories, including the most resource — intensive scenario-training the Llama 3.1 405B model, which belongs to the class of large language models (LLM). This was made possible by using Tyche and Nyx supercomputers built in partnership with CoreWeave and IBM. In total, the test infrastructure included 2,496 Blackwell GPUs and 1,248 Nvidia Grace CPUs.
The results were verified by the international association MLCommons, which unites more than 125 technology leaders and scientific organizations. Confirmed data shows that on the task of fine-tuning the Llama 2 70B model using LoRA technology, Nvidia DGX B200 systems with eight Blackwell GPUs showed 2.5-fold performance growth, which became a new industry benchmark.
The breakthrough was made possible by a variety of engineering and software solutions:
- Using liquid cooling of server racks;
- Up to 13.4 TB of coherent memory per rack;
- Latest Nvidia NVLink 5th Generation and NVLink Switch Connectivity Technologies;
- Nvidia Quantum-2 InfiniBand network that provides horizontal scaling of computing;
- Improvements to the NeMo Framework software stack focused on training multi-modal AI and agents.
Dave Salvator, Director of Accelerated Computing Products at Nvidia, explained: "MLPerf's objectivity and versatility make it a reliable reference point for the industry. But the real economic impact begins not at the testing stage, but in the process of deploying models and creating intelligent solutions."
One of the key areas of Nvidia's development is the creation of "AI factories" -specialized data centers for training and operating AI agents capable of reasoning, decision-making, and real-time interaction. These complexes combine GPU, CPU, network solutions and a complete software stack-from CUDA-X to the TensorRT-LLM and Dynamo frameworks.
Blackwell and NeMo are becoming the foundation for a new generation of AI applications-from medicine and finance to science and government. Nvidia is confident that the transition from a chip manufacturer to a system integrator will allow it to maintain its technological leadership and accelerate the transformation of AI infrastructure around the world.