Diamond Member Pelican Press 0 Posted May 23, 2025 Diamond Member Share Posted May 23, 2025 This is the hidden content, please Sign In or Sign Up DGX B200 Blackwell node sets world record, breaking over 1,000 TPS/user Nvidia has reportedly broken another AI world record, breaking the 1,000 tokens per second (TPS) barrier per user with Meta’s Llama 4 Maverick large language model, according to This is the hidden content, please Sign In or Sign Up in a post on LinkedIn. This breakthrough was achieved with Nvidia’s latest DGX B200 node, which features eight Blackwell GPUs. Nvidia outperformed the previous record holder, SambaNova, by 31%, achieving 1,038 TPS/user compared to AI chipmaker SambaNova’s prior record of 792 TPS/user. According to Artificial Analysis’s benchmark report, Nvidia and SambaNova are well ahead of everyone in this performance metric. This is the hidden content, please Sign In or Sign Up and Groq achieved scores just shy of 300 TPS/user — the rest, Fireworks, Lambda Labs, Kluster.ai, CentML, This is the hidden content, please Sign In or Sign Up Vertex, Together.ai, Deepinfra, Novita, and Azure, all achieved scores below 200 TPS/user. Blackwell’s record-breaking result was achieved using a plethora of performance optimizations tailor-made to the Llama 4 Maverick architecture. Nvidia allegedly made extensive software optimizations using TensorRT and trained a speculative decoding draft model using Eagle-3 techniques, which are designed to accelerate inference in LLMs by predicting tokens ahead of time. These two optimizations alone achieved a 4x performance uplift compared to Blackwell’s best prior results. You may like Accuracy was also improved using FP8 data types (rather than BF16), Attention operations, and the Mixture of Experts AI technique that took the world by storm when it was first introduced with the DeepSeek R1 model. Nvidia also shared a variety of other optimizations its software engineers made to the CUDA kernel to optimize performance further, including techniques such as spatial partitioning and GEMM weight shuffling. TPS/user is an AI performance metric that stands for tokens per second per user. Tokens are the foundation of LLM-powered software such as Copilot and ChatGPT; when you type a question into ChatGPT or Copilot, your individual words and characters are tokens. The LLM takes these tokens and outputs an answer based on those tokens according to the LLM’s programming. The user part (of TPS/user) is aimed at single-user-focused benchmarking, rather than batching. This method of benchmarking is important for AI chatbot developers to create a better experience for people. The faster a GPU cluster can process tokens per second per user, the faster an AI chatbot will respond to you. Follow This is the hidden content, please Sign In or Sign Up to get our up-to-date news, analysis, and reviews in your feeds. Make sure to click the Follow button. Get Tom’s Hardware’s best news and in-depth reviews, straight to your inbox. This is the hidden content, please Sign In or Sign Up #DGX #B200 #Blackwell #node #sets #world #record #breaking #TPSuser This is the hidden content, please Sign In or Sign Up This is the hidden content, please Sign In or Sign Up 0 Quote Link to comment https://hopzone.eu/forums/topic/257709-dgx-b200-blackwell-node-sets-world-record-breaking-over-1000-tpsuser/ Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.