DGX B200 Blackwell node sets world record, breaking over 1,000 TPS/user

Pelican Press · May 23, 2025

This is the hidden content, please

DGX B200 Blackwell node sets world record, breaking over 1,000 TPS/user

Nvidia has reportedly broken another AI world record, breaking the 1,000 tokens per second (TPS) barrier per user with Meta’s Llama 4 Maverick large language model, according to

This is the hidden content, please

in a post on LinkedIn. This breakthrough was achieved with Nvidia’s latest DGX B200 node, which features eight Blackwell GPUs.

Nvidia outperformed the previous record holder, SambaNova, by 31%, achieving 1,038 TPS/user compared to AI chipmaker SambaNova’s prior record of 792 TPS/user. According to Artificial Analysis’s benchmark report, Nvidia and SambaNova are well ahead of everyone in this performance metric.

This is the hidden content, please

and Groq achieved scores just shy of 300 TPS/user — the rest, Fireworks, Lambda Labs, Kluster.ai, CentML,

This is the hidden content, please

Vertex, Together.ai, Deepinfra, Novita, and Azure, all achieved scores below 200 TPS/user.

Blackwell’s record-breaking result was achieved using a plethora of performance optimizations tailor-made to the Llama 4 Maverick architecture. Nvidia allegedly made extensive software optimizations using TensorRT and trained a speculative decoding draft model using Eagle-3 techniques, which are designed to accelerate inference in LLMs by predicting tokens ahead of time. These two optimizations alone achieved a 4x performance uplift compared to Blackwell’s best prior results.

Sign In

Home

Activity

Store

My Details

Forums

All Servers

DGX B200 Blackwell node sets world record, breaking over 1,000 TPS/user

Recommended Posts

Pelican Press 0

Trader Feedback

DGX B200 Blackwell node sets world record, breaking over 1,000 TPS/user

Link to comment

Share on other sites

Join the conversation

Most Contributions

Vote for the server

Recently Browsing 0 members

Important Information