Google Cloud unveils eighth-generation TPUs designed to support an agentic era

Google unveils next-generation TPUs – split into two series, 8t and 8i
8-ton superpods can deliver 121 ExaFlops, up from 42.5 last year
8i offers 3x more SRAM and increased HBM

Google Cloud announced its eighth-generation Tensor Processing Units (TPUs), designed specifically for the agentic shift we are currently seeing within AI.

Revealed at Google Cloud Next 2026, the upgrades focus on longer context windows, multi-step reasoning, and responsiveness at scale. Its cloud infrastructure is therefore being rebuilt to support persistent memory, continuous inference and multi-model workloads.

This year we see two separate TPUs designed to support HBM’s massive scaling, with Google Cloud emphasizing memory bandwidth as much as compute.

Article continues below

8t and 8i TPUs target training billions of parameters in million-chip clusters

The first of the two TPUs, 8t, has been optimized to be distributed across huge clusters for training base models. With about an 80% year-over-year improvement in performance per dollar, the company says it will more efficiently train models with billions of parameters.

Google Cloud explained that a single 8t TPU superpod can scale up to 9,600 chips, providing 2 PB of shared HBM and 121 ExaFlops of compute. For comparison, last year Ironwood was valued at 9,216 chips in a superpod and 42.5 ExaFlops.

Google Cloud also warned of “the wall of latency” we face in a permanent agentic era, hence the launch of 8i, a second chip that serves as a post-training and inference engine.

The TPU 8i sees approximately 3x more on-chip SRAM at 384MB as well as 288GB of HBM, with pod size now up to 1,152 chips versus 256, delivering 11.6 ExaFlops of performance (up from 1.2 ExaFlops).

When it comes to energy and thermal efficiency, Google Cloud offers up to 2x better performance per watt than its predecessor Ironwood.

“We[‘ve] “We have innovated hardware and software to enable our data centers to deliver six times more computing power per unit of electricity than five years ago,” explained Amin Vahdat, executive vice president and chief technologist for AI and infrastructure.

General availability for Google Cloud customers is expected in the coming months and, of course, the TPU 8t and TPU 8i will be at the forefront of the latest Gemini models.

The company also sees eighth-generation hardware as playing a role in the development of the next cutting-edge models by distributing training beyond a single superpod using Pathways and JAX to unlock scaling beyond a million TPU chips for each training cluster – something executives confirmed at the event is currently entirely theoretical (but technically possible), with TPUs not yet available at such a scale.

Follow TechRadar on Google News And add us as your favorite source to get our news, reviews and expert opinions in your feeds. Make sure to click the Follow button!

And of course you can too follow TechRadar on TikTok for news, reviews, unboxings in video form and receive regular updates from us on WhatsApp Also.

Must Read

Leave a Comment Cancel Reply