Sambanova hits 198 tokens per second on the Deepseek-R1 671B complete and not distilled with only 16 RDU SN40L chips

Sambanova executes Deepseek-R1 at 198 tokens / dry using 16 personalized chips
The RDU SN40L chip would have 3x faster, 5x more efficient than GPUs
5x speed boost is promised soon, with a capacity of 100 times at the end of the year on the cloud

The Chinese AI Upstart Deepseek very quickly made a name for himself in 2025, with its large -scale open source R1 model model, built for advanced reasoning tasks, showing equal performance with the higher models of industry , while being more profitable.

Sambanova Systems, an AI startup founded in 2017 by experts from Sun / Oracle and the University of Stanford, has now announced what it claims to be the fastest deployment in the world of Deepseek-R1 671B LLM to date .

The company claims to have reached 198 tokens per second, by user, using only 16 personalized chips, replacing the 40 racks of 320 GPU NVIDIA which would generally be necessary.

Verified independently

“Propelled by the SN40L RDU chip, Sambanova is the fastest platform that executes Deepseek,” said Rodrigo Liang, CEO and co-founder of Sambanova. “It will increase to 5 times faster than the GPU’s last speed on a single rack – and by the end of the year, we will offer a 100x capacity for Deepseek -R1.”

While NVIDIA GPUs have traditionally fueled large IA workloads, Sambanova maintains that its reconfigurable data flow architecture offers a more effective solution. The company claims that its equipment offers the speed three times and five times the effectiveness of the GPUs in the lead while retaining the complete power of the reasoning of Deepseek-R1.

“Deepseek-R1 is one of the most advanced weaponic AI models available, but its full potential was limited by GPU ineffectiveness,” said Liang. “It changes today. We provide the next major breakthrough – the collapse of inference costs and the reduction in material requirements from 40 racks to one – to offer Deepseek -R1 to the fastest speeds, effectively. »»

George Cameron, co-founder of AI evaluating an artificial analysis of the company, said that his company had “independently the deployment of sambanova cloud regardless of the complete mix of 671 billion deep parameters that we have ever measured for Deepseek -R1. High output speeds are particularly important for reasoning models, as these models use reasoning output tokens to improve the quality of their responses. High output speeds of Sambanova will support the use of reasoning models in useful use of latency. »»

Deepseek-R1 671B is now available on Sambanova Cloud, with API access offered to select users. The company is fast scale and says that it hopes to reach 20,000 tokens per second in the total rate of rack “in the near future”.

(Image credit: artificial analysis)

Must Read

Leave a Comment Cancel Reply