China built a giant AI supercomputer without an Nvidia GPU, using millions of Huawei ARM cores instead

Huawei-linked LineShine supercomputer packs 2.45 million Arm cores into massive AI cluster
Huawei processors now power one of China’s largest AI computing facilities
CPU-only supercomputers eliminate costly data transfers between processors and accelerators during workloads

China has deployed a massive CPU-only supercomputer called LineShine that delivers 1.54 exaflops of AI training performance without using a GPU at all.

The system contains 20,480 compute nodes, each containing two LX2 processors for a total of 40,960 chips across the entire machine.

Each LX2 processor has 304 CPU cores, meaning the entire supercomputer uses around 2.45 million Armv9 cores in total.

Inside the LX2 processor’s unusual architecture

The processor was developed by Huawei or in a joint design with China’s National Supercomputing Center, although the exact origin remains unknown.

Each LX2 processor uses two compute chipsets with cores organized into eight clusters containing 38 cores per cluster.

Each core includes ARM’s Scalable Vector Extension and Scalable Matrix Extension units that accelerate matrix operations used in AI training.

The processor delivers 60.3 teraflops of FP64 performance, 240 teraflops of BF16 throughput, and 960 teraflops of INT8 performance from a single chip.

The memory subsystem combines 32 GB of integrated HBM offering up to 4 TB/s of bandwidth with up to 256 GB of out-of-package DDR5 memory.

CPU-only systems offer several advantages for complex scientific tasks that combine AI training with massive data ingestion and preprocessing.

Since everything runs on the same CPU and memory space, they avoid costly and bandwidth-intensive CPU-to-GPU data transfers.

Homogeneous CPU-based systems can also expose much larger coherent memory pools by combining HBM with large DDR capacities.

This is useful for handling massive scientific datasets, augmented retrieval generation, and long context windows that GPU memory limitations cannot easily handle.

The big caveat that comes with this approach

CPU-only systems are generally less energy efficient and offer lower density AI throughput than GPU-based supercomputers.

This is the main reason why most industry players rely on heterogeneous CPU and GPU architectures for large-scale AI workloads.

China is following this path largely because of U.S. bans on GPU exports, not because CPU-only systems are technically superior for AI tasks.

The LineShine shows that CPUs can successfully perform GPU tasks, but the efficiency gap between the two approaches remains large and is unlikely to close any time soon.

China is making a strategic trade-off, accepting lower performance and higher power consumption in exchange for independence from foreign hardware and software ecosystems such as Nvidia’s GPUs and CUDA.

Whether this tradeoff is relevant to long-term AI development depends entirely on how quickly Chinese manufacturers can close the performance gap with their own GPU designs.

Until then, the LineShine will remain a remarkable technical achievement and a practical necessity, but probably not a model for how most of the world will build AI supercomputers.

Via Toms Hardware

Google logo on black background next to text

Follow TechRadar on Google News And add us as your favorite source to get our news, reviews and expert opinions in your feeds.

Must Read

Leave a Comment Cancel Reply