- GPUs handle prefill operations by converting prompts into key-value caches
- SambaNova RDUs generate tokens with high throughput and low latency
- Intel Xeon 6 processors manage workload distribution and execute compiled code
Intel and SambaNova Systems have introduced a common hardware plane combining GPUs, SambaNova RDUs, and Intel Xeon 6 processors for large-scale inference workloads.
The system allocates GPUs for pre-filling operations, RDUs for decoding, and Xeon processors for execution and orchestration tasks in agent-driven environments.
“Agentic AI is moving into production – and the winning model we see is GPUs to start the work, Intel Xeon 6 to run it, and SambaNova RDUs to finish it quickly,” said Rodrigo Liang, CEO and co-founder of SambaNova Systems.
Article continues below
The CPU is the execution and control layer
This design is expected to be available in the second half of 2026 for enterprises, cloud providers and sovereign deployments.
The architecture places Intel Xeon 6 processors at the center of system control, where they manage workload distribution, execute code, and coordinate interactions with tools.
This includes managing compilation, validating output, and maintaining communication between concurrent processes.
“When thousands of coding agents are simultaneously generating tool calls, fetch requests, code constructs, and encrypted inter-agent messages, the processor is not a background component: it is the executive and action layer of the system,” said Harry Ault, CRO of SambaNova.
The statement defines the processor as the primary layer responsible for system behavior rather than a supporting component.
According to SambaNova, Xeon 6 delivers over 50% faster LLVM compile times compared to Arm-based server processors.
It also delivers up to 70% faster vector database performance compared to other x86 systems.
These numbers relate to execution speed in encoding and fetching workflows, and in this configuration, GPUs process the pre-fill step by converting prompts into key-value caches.
SambaNova RDUs function as a decoding layer, generating tokens at high throughput and low latency.
Xeon 6 processors function as both host processors and execution engines, handling system-level operations and running compiled workloads.
“Production inference is evolving toward heterogeneous hardware: no single chip type is optimal for every step of an agentic workflow,” said Banghua Zhu, co-founder and CTO at RadixArk.
He added that combining RDUs with Xeon processors allows systems to maintain compatibility with existing software environments.
The system is designed to operate in existing air-cooled data centers without requiring new construction.
According to the companies, this allows for increased inference workloads without additional pressure on water and energy resources.
As Nvidia and Groq continue to focus on improving inference throughput and latency, this announcement adds a layer of competition.
It offers an alternative approach that distributes workloads across multiple hardware layers rather than relying on a single processing model.
Follow TechRadar on Google News And add us as your favorite source to get our news, reviews and expert opinions in your feeds. Make sure to click the Follow button!
And of course you can too follow TechRadar on TikTok for news, reviews, unboxings in video form and receive regular updates from us on WhatsApp Also.




