Meta found a way to turn outdated server RAM into cheap hyperscale memory expansion without purchasing new DRAM

Meta recycled retired DDR4 memory instead of purchasing expensive new DRAM
CXL technology transformed abandoned server memory into useful computing capacity
Meta reported 25% fewer servers for machine learning inference workloads

Memory shortages, rising DRAM prices and extended delivery times have pushed hyperscalers toward alternatives that not long ago seemed impractical, but Meta has developed a way to reuse old DDR4 memory pulled from decommissioned servers rather than throwing it away.

This approach allows companies to increase server memory capacity without purchasing new DRAM, a cost that researchers describe as the so-called RAM tax.

This expansion is made possible by Compute Express Link (CXL) technology, which connects older DDR4 modules to newer DDR5 memory pools on the same machine.

Reuse old memory instead of purchasing new memory

Meta describes the approach as offering memory expansion at near-zero cost while significantly reducing e-waste and infrastructure emissions.

This strategy comes at a time when memory supply constraints continue to impact server deployment schedules in cloud computing environments around the world.

According to Meta researchers, existing CXL implementations struggled because extended memory provided nearly ten times less bandwidth than local memory.

The company also reported latency levels approximately 60% higher than directly attached memory residing alongside processor sockets inside servers.

Another limitation was commercial CXL products bundling controllers with DRAM modules, preventing practical reuse of existing DDR4 inventories at scale.

Meta responded by developing an in-house ASIC known as Vistara, specifically designed for low latency, power efficiency and recycled memory usage.

The associated software stack automatically determines appropriate memory ratios for individual workloads while disabling expansion when delays become unacceptable operational compromises.

“We address these challenges through hardware-software co-design. On the hardware side, we design an in-house CXL ASIC, Vistara, optimized for DRAM reuse, power efficiency and low latency,” Meta said.

“On the software side, we build an optimized solution based on Transparent Page Placement (TPP), determine the appropriate local memory to expanded memory ratio for each workload, and automate configuration per workload, including disabling expanded memory for workloads that cannot tolerate the increased latency. »

Meta claims that the architecture demonstrates sufficient practical value to warrant deployment in production environments that manage various computing requirements on a daily basis.

Meta reported that disaggregated machine learning inference workloads reduced server counts by up to 25% through implementation.

Distributed cache systems have reportedly seen average latency reductions of around 29%, although they rely in part on slower recycled memory resources.

The results suggest that additional capacity sometimes exceeds raw memory speed when applications experience shortages more than response times.

Interestingly, the same interconnect technology that is attracting Meta’s attention is also attracting interest from semiconductor companies developing large accelerator fabrics globally.

The broader ecosystem includes the work of companies seeking alternatives to proprietary interconnect technologies such as Nvidia’s widely adopted NVLink systems.

Among these is Ultra Accelerator Link, or UAL, a separate initiative backed by AMD, AWS, Google, Microsoft and Meta to connect accelerators from different hardware vendors.

In Meta’s own testing, disaggregated machine learning inference systems and distributed caching infrastructure were the two workloads the researchers looked at directly.

Both saw measurable improvements from the recycled memory approach, with inference systems requiring fewer servers and caches experiencing lower average latency.

Whether DDR4 recycled via CXL becomes standard practice will likely depend on what performance trade-offs remain acceptable outside of hyperscale environments.

Via blocks and files

Follow TechRadar on Google News And add us as your favorite source to get our news, reviews and expert opinions in your feeds. Make sure to click the Follow button!

And of course you can too follow TechRadar on TikTok for news, reviews, unboxings in video form and receive regular updates from us on WhatsApp Also.

Must Read

Leave a Comment Cancel Reply