- Elon Musk provides that AI calculates equal to 50 million H100 GPU in just five years
- XAI’s training target is equivalent to 50 exaflops, but that does not mean 50 million literal GPUs
- Reaching 50 exaflops with H100s would require energy equal to 35 nuclear power plants
Elon Musk shared a new daring step for Xai, which is to deploy the equivalent of 50 million H100 GPU by 2030.
As a measurement of the performance of AI training, the complaint refers to the capacity of calculation, and not to a enumeration of literal unity.
However, even with the progress in progress in AI accelerator equipment, this objective involves extraordinary infrastructure commitments, in particular in power and capital.
A massive leap in the calculation scale, with less GPU than it seems
In an article on X, Musk said: “The XAI objective is 50 million equivalent calculation units of H100 equivalent (but much better energy efficiency) in 5 years.”
Each NVIDIA H100 AI GPU can provide around 1,000 TFOP in FP16 or BF16, current formats for AI – and reaching 50 exaflops by using this basic line would theoretically require 50 million H100.
Although new architectures such as Blackwell and Rubin considerably improve performance per chip.
According to performance projections, only about 650,000 GPUs using future ultra Feynman architecture may be necessary to reach the target.
The company has already started to set up aggressively, and its current colossus 1 cluster is powered by 200,000 GPU H100 and H200 based on Hopper, plus 30,000 GB 200 chips based on Blackwell.
A new cluster, Colossus 2, should soon be posted with more than a million GPU units, combining 550,000 GB 200 and GB 300 nodes.
This puts XAI among the fastest adopters of technologies of training of writer and AI model AI of cutting edge.
The company has probably chosen H100 compared to the new H200 because the first remains a well -understood reference point in the AI community, largely complex and used in major deployments.
Its FP16 and BF16 flow is coherent in fact a clear unit of measurement for longer -term planning.
But perhaps the most urgent problem is energy. An exaflops cluster propelled by GPU H100 would require 35 GW, enough for 35 nuclear power plants.
Even using the most effective projected GPUs, such as Feynman Ultra, a 50 Exaflops cluster may require up to 4.685 GW of power.
It is more than the triple of the use of power of the next Colossus 2 of Xai. Even with the progress of efficiency, the scale of energy supply remains a key uncertainty.
In addition, the cost will also be a problem. Based on current prices, only one NVIDIA H100 costs more than $ 25,000.
The use of 650,000 next generation GPUs could still represent tens of billions of dollars in equipment alone, not to mention interconnection, cooling, installations and energy infrastructure.
In the end, the Musk’s plan for XAI is technically plausible but financially and logistically intimidating.
Via Tomshardware