AMD goes to the modular GPU strategy with MI355X, ending the APU conceptions of MI300A style

MI355X leads the new MI350 AMD series with a 288 GB memory and full performance in cash
AMD drops the APU integration, focusing on the flexibility of the GPU on a rack scale
FP6 and FP4 data types highlight the design choices optimized by MI355X inference

AMD has unveiled its new GPUs Mi350X and MI355X for AI workloads during its AI advanced 2025 event, offering two options built on its latest Architecture DNC 4.

While the two share a common platform, the MI355X stands out as the high performance liquid cooling variant designed for large-scale demanding deployments.

The MI355X supports up to 128 GPU per rack and provides high speed for training and inference workloads. It has 288 GB of HBM3E memory and 8 TB memory bandwidth.

GPU design only

AMD claims that the MI355X offers up to 4 times the calculation of the AI and 35 times the inference performance of its previous generation, thanks to architectural improvements and a transition to the N3P process of TSMC.

Inside, the chip includes eight calculation matrices with 256 active calculation units and a total of 185 billion transistors, marking a 21% increase compared to the previous model. Each die connects through redesigned I / S tiles, reduced from four to two, to the double internal bandwidth while lowering energy consumption.

The MI355X is only a GPU design, abandoning the APU CPU-GPU approach used in the MI300A. AMD says that this decision better supports modular deployment and flexibility on a rack scale.

It connects to the host via a PCIe 5.0 x16 interface and communicates with the PAIR GPUs using seven links of infinity fabric, reaching more than 1 TB / s in the GPU-GPU bandwidth.

Each HBM battery goes with 32 MB of Infinity cache, and the architecture supports more recent and lower precision formats like FP4 and FP6.

The MI355X performs FP6 operations at FP4 rates, an AMD functionality highlights as beneficial for heavy workloads of inference. It also offers 1.6 times the HBM3E memory capacity of the GB200 and B200 of Nvidia, although the memory bandwidth remains similar. AMD claims an inference performance lead from 1.2x to 1.3x on the best products in Nvidia.

The GPU draws up to 1,400 W in its form cooled by liquid, offering a higher performance density by Rack. AMD says it improves TCO by allowing users to calculate the calculation without expanding the physical imprint.

The chip fits into the standard OAM modules and is compatible with UBB platform servers, accelerating deployment.

“The world of AI does not slow down – and either,” said Vamsi Boppana, please, AI Group. “At AMD, we do not only follow the pace, we set the bar. Our customers require real and deployable solutions that evolve, and this is exactly what we provide with the AMD Instinct MI350 series. With advanced performance, a bandwidth of massive memory and an open flexible and an open infrastructure, we allow innovators of all industries to go faster, is following. “

AMD plans to launch its Instinct MI400 series in 2026.

Must Read

Leave a Comment Cancel Reply