Jump to content
  • Sign Up
×
×
  • Create New...

xAI Colossus supercomputer with 100K H100 GPUs comes online — Musk lays out plans to double GPU count to 200K with 50K H100 and 50K H200


Recommended Posts

  • Diamond Member

This is the hidden content, please

xAI Colossus supercomputer with 100K H100 GPUs comes online — Musk lays out plans to double GPU count to 200K with 50K H100 and 50K H200

data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///ywAAAAAAQABAAACAUwAOw==

Elon Musk’s X (formerly

This is the hidden content, please
) has brought the world’s most powerful training system online. The Colossus supercomputer uses as many as 100,000 Nvidia H100 GPUs for training and is set to expand with another 50,000 Nvidia H100 and H200 GPUs in the coming months.

“This weekend, the xAI team brought our Colossus 100K H100 training cluster online,” Elon Musk 

This is the hidden content, please
. “From start to finish, it was done in 122 days. Colossus is the most powerful AI training system in the world. Moreover, it will double in size to 200K (50K H200s) in a few months.”

According to Michael Dell, the head of the high-tech giant, Dell developed and assembled the Colossus system quickly. This highlights that the server maker has accumulated considerable experience deploying AI servers during the last few years’ AI *****.

Elon Musk and his companies have been busy making supercomputer-related announcements recently. In late August, Tesla announced its Cortex AI cluster featuring 50,000 Nvidia H100 GPUs and 20,000 of Tesla’s Dojo AI wafer-sized chips. Even before that, in late July, X kicked off AI training on the Memphis Supercluster, comprising 100,000 liquid-cooled H100 GPUs. This supercomputer has to consume at least 150 MW of power, as 100,000 H100 GPUs consume around 70 MW.

Although all of these clusters are formally operational and even training AI models, it is entirely unclear how many are actually online today. First, it takes some time to debug and optimize the settings of those superclusters. Second, X needs to ensure that they get enough power, and while Elon Musk’s company has been using 14 diesel generators to power its Memphis supercomputer, they were still not enough to feed all 100,000 H100 GPUs.

xAI’s training of the Grok version 2 large language model (LLM) required up to 20,000 Nvidia H100 GPUs, and Musk predicted that future versions, such as Grok 3, will need even more resources, potentially around 100,000 Nvidia H100 processors for training. To that end, xAI needs its vast data centers to train Grok 3 and then run inference on this model.



This is the hidden content, please

#xAI #Colossus #supercomputer #100K #H100 #GPUs #online #Musk #lays #plans #double #GPU #count #200K #50K #H100 #50K #H200

This is the hidden content, please

This is the hidden content, please

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Vote for the server

    To vote for this server you must login.

    Jim Carrey Flirting GIF

  • Recently Browsing   0 members

    • No registered users viewing this page.

Important Information

Privacy Notice: We utilize cookies to optimize your browsing experience and analyze website traffic. By consenting, you acknowledge and agree to our Cookie Policy, ensuring your privacy preferences are respected.