- Most AI GPUs run at incredibly low utilization on production systems
- Businesses pay for GPU capacity twenty times more than needed
- Overprovisioning is increasing sharply instead of improving year over year
Tech companies are rushing to buy massive amounts of AI infrastructure, but most of them are doing virtually no useful work.
A report from Cast AI, based on tens of thousands of Kubernetes clusters across AWS, Azure, and GCP, found that the average GPU utilization sits at just 5%.
Many teams deploy sophisticated AI tools to manage their applications, but those same tools are not used to optimize the underlying infrastructure.
Article continues below
The numbers are getting worse, not better
Organizations pay for approximately 20 times more GPU capacity than their workloads are actually using at any given time.
Figures come from direct measurements of production clusters and millions of compute resources before any optimization.
“This is the third year that we have published this report. The numbers are worse,” says Laurent Gil, co-founder and president of Cast AI. “CPU usage fell to 8%, down from 10%. Memory dropped from 23% to 20%.”
The report also measures what’s called overprovisioning, which is the gap between what workloads actually need and what teams allocate to them.
CPU overprovisioning increased from 40% to 69% year-over-year, while memory overprovisioning now stands at 79%.
This means that organizations are reserving nearly twice as much CPU resources and four times as much memory than their workloads actually consume.
In short, organizations are paying for infrastructure that their workloads don’t even require, and the trend is getting faster rather than better.
The situation becomes even more expensive when directly comparing CPU and GPU costs. An idle CPU core costs just pennies per hour, but an idle GPU costs dollars per hour.
For the first time since EC2 launched in 2006, GPU prices are going up instead of down.
In January 2026, AWS increased prices for H200 capacity blocks by 15%, citing supply and demand, breaking a two-decade precedent.
“At a 5% utilization rate, the math doesn’t work,” the report says. The hoarding instinct makes sense because delivery times are long, but that same hoarding feeds the scarcity loop that drives prices up even more.
Not all clusters perform this poorly, and one organization achieved a utilization rate of 49% on H200s and 30% on H100s, well above the 5% average.
The difference is automation rather than luck or better hardware. The tools to solve this problem already exist, including automated resizing, GPU sharing or time slicing, and Spot management.
However, most teams never achieve this because overprovisioning seems safer than undercapacity, but this security comes at a high price.
Teams that closed the gap stopped viewing resource efficiency as a one-time manual task and started treating it as an automated, ongoing process.
But Cast AI data reveals that most companies seem willing to continue paying high fees rather than change their ways.
Follow TechRadar on Google News And add us as your favorite source to get our news, reviews and expert opinions in your feeds.




