DeepSeek’s New Engram Technique Could Dramatically Reduce AI Memory Costs While Increasing Reasoning Power and Alleviating Global Pressure on DRAM

DeepSeek’s Engram separates static memory from computation, increasing the efficiency of large AI models
The method reduces memory requirements at high speed by allowing DeepSeek models to use searches
Engram supports asynchronous prefetching on multiple GPUs with minimal performance overhead

DeepSeek, in collaboration with Peking University, introduced a new training method called Engram, designed to decouple memory storage from computing processes.

Traditional large language models require high-bandwidth memory for knowledge retrieval and basic calculations, creating a performance and cost bottleneck.

This HBM bottleneck is widely recognized as one of the main reasons why DRAM prices have increased 5x in just 10 weeks, as demand for hardware has increased to support large AI models.

Validation and technical approach

The researchers said existing models waste sequential depth on trivial operations, which could otherwise support higher-level reasoning.

Engram allows models to efficiently “search” for critical information without overloading GPU memory, freeing up capacity for more complex reasoning tasks.

The system was tested on a 27 billion parameter model and showed measurable improvements over industry standard benchmarks.

By performing knowledge retrieval via hashed N-grams, Engram provides static memory access independent of the current context.

The retrieved information is then adjusted using a contextual triggering mechanism to align with the hidden state of the model.

This design allows models to handle long context entries more efficiently and supports system-level prefetching with minimal performance overhead.

The Engram method complements other effective hardware approaches, including solutions such as Phison’s AI inference accelerators.

Engram minimizes the amount of high-speed memory required by using static information lookups, making memory usage more efficient.

Phison offers a cost-effective way to expand total memory using SSDs, supporting large AI models such as Engram or Mixture-of-Experts systems.

Combined, these approaches enable AI systems to optimize rapid memory usage while affordably increasing overall memory capacity.

It also works alongside emerging Compute Express Link (CXL) standards, which aim to overcome GPU memory bottlenecks in large-scale AI workloads.

The method separates static model storage from dynamic computation, thereby improving the Transformer skeleton without increasing FLOPs or the number of parameters.

DeepSeek formalized a U-shaped expansion rule to optimize parameter allocation between the MoE conditional calculation module and the Engram memory module.

Tests show that reallocating about 20-25% of the sparse parameter budget to Engram yields better performance than pure MoE models, maintaining stable gains at different scales.

Expanding memory locations provides predictable improvements without additional computational cost.

This confirms the scalability of conditional memory as an independent axis for sparse models.

Engram’s deterministic fetch mechanism allows memory capacity to scale linearly across multiple GPUs while supporting asynchronous prefetching during inference.

It offloads static knowledge reconstruction from lower layers, freeing attention mechanisms to focus on the overall context.

Hierarchical caching of frequently used integrations improves efficiency, and the module works with existing GPU and system memory architectures, potentially avoiding costly HBM upgrades.

The technique could ease pressure on expensive memory hardware, particularly in regions like China, where HBM access lags behind rivals such as Samsung, SK Hynix and Micron.

Engram’s early validation suggests that models can expand parameter scale and reasoning ability while handling memory demands more efficiently.

This approach can help alleviate memory constraints in AI infrastructure, potentially reducing the sharp price fluctuations of DDR5 DRAM.

Via SCMP

Follow TechRadar on Google News And add us as your favorite source to get our news, reviews and expert opinions in your feeds. Make sure to click the Follow button!

And of course you can too follow TechRadar on TikTok for news, reviews, unboxings in video form and receive regular updates from us on WhatsApp Also.

Must Read

Leave a Comment Cancel Reply