Jump to content
  • Sign Up
×
×
  • Create New...

DGX B200 Blackwell node sets world record, breaking over 1,000 TPS/user


Recommended Posts

  • Diamond Member

This is the hidden content, please

DGX B200 Blackwell node sets world record, breaking over 1,000 TPS/user

ssmE9PuBthYdYtduciLnxg.jpg

Nvidia has reportedly broken another AI world record, breaking the 1,000 tokens per second (TPS) barrier per user with Meta’s Llama 4 Maverick large language model, according to

This is the hidden content, please
in a post on LinkedIn. This breakthrough was achieved with Nvidia’s latest DGX B200 node, which features eight Blackwell GPUs.

Nvidia outperformed the previous record holder, SambaNova, by 31%, achieving 1,038 TPS/user compared to AI chipmaker SambaNova’s prior record of 792 TPS/user. According to Artificial Analysis’s benchmark report, Nvidia and SambaNova are well ahead of everyone in this performance metric.

This is the hidden content, please
and Groq achieved scores just shy of 300 TPS/user — the rest, Fireworks, Lambda Labs, Kluster.ai, CentML,
This is the hidden content, please
Vertex, Together.ai, Deepinfra, Novita, and Azure, all achieved scores below 200 TPS/user.

Blackwell’s record-breaking result was achieved using a plethora of performance optimizations tailor-made to the Llama 4 Maverick architecture. Nvidia allegedly made extensive software optimizations using TensorRT and trained a speculative decoding draft model using Eagle-3 techniques, which are designed to accelerate inference in LLMs by predicting tokens ahead of time. These two optimizations alone achieved a 4x performance uplift compared to Blackwell’s best prior results.


You may like

Accuracy was also improved using FP8 data types (rather than BF16), Attention operations, and the Mixture of Experts AI technique that took the world by storm when it was first introduced with the DeepSeek R1 model. Nvidia also shared a variety of other optimizations its software engineers made to the CUDA kernel to optimize performance further, including techniques such as spatial partitioning and GEMM weight shuffling.

TPS/user is an AI performance metric that stands for tokens per second per user. Tokens are the foundation of LLM-powered software such as Copilot and ChatGPT; when you type a question into ChatGPT or Copilot, your individual words and characters are tokens. The LLM takes these tokens and outputs an answer based on those tokens according to the LLM’s programming.

The user part (of TPS/user) is aimed at single-user-focused benchmarking, rather than batching. This method of benchmarking is important for AI chatbot developers to create a better experience for people. The faster a GPU cluster can process tokens per second per user, the faster an AI chatbot will respond to you.

Follow

This is the hidden content, please
to get our up-to-date news, analysis, and reviews in your feeds. Make sure to click the Follow button.

Get Tom’s Hardware’s best news and in-depth reviews, straight to your inbox.



This is the hidden content, please

#DGX #B200 #Blackwell #node #sets #world #record #breaking #TPSuser

This is the hidden content, please

This is the hidden content, please

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Vote for the server

    To vote for this server you must login.

    Jim Carrey Flirting GIF

  • Recently Browsing   0 members

    • No registered users viewing this page.

Important Information

Privacy Notice: We utilize cookies to optimize your browsing experience and analyze website traffic. By consenting, you acknowledge and agree to our Cookie Policy, ensuring your privacy preferences are respected.