Breakthrough for Large Scale Computing: 'Memory Disaggregation' Made Practical
May 29, 2017 | University of Michigan RegentsEstimated reading time: 3 minutes
For decades, operators of large computer clusters in both the cloud and high-performance computing communities have searched for an efficient way to share server memory in order to speed up application performance.
Now a newly available open-source software developed by University of Michigan engineers makes that practical.
The software is called Infiniswap, and it can help organizations that utilize Remote Direct Memory Access networks save money and conserve resources by stabilizing memory loads among machines. Unlike its predecessors, it requires no new hardware and no changes to existing applications or operating systems.
Infiniswap can boost the memory utilization in a cluster by up to 47 percent, which can lead to financial savings of up to 27 percent, the researchers say. More efficient use of the memory the cluster already has means less money spent on additional memory.
"Infiniswap is the first system to scalably implement cluster-wide 'memory disaggregation,' whereby the memory of all the servers in a computing cluster is transparently exposed as a single memory pool to all the applications in the cluster," said Infiniswap project leader Mosharaf Chowdhury, U-M assistant professor of computer science and engineering.
"Memory disaggregation is considered a crown jewel in large scale computing because of memory scarcity in modern clusters."
The software lets servers instantly borrow memory from other servers in the cluster when they run out, instead of writing to slower storage media such as disks. Writing to disk when a server runs out of memory is known as "paging out" or "swapping." Disks are orders of magnitude slower than memory, and data-intensive applications often crash or halt when servers need to page.
Prior approaches toward memory disaggregation—from computer architecture, high-performance computing and systems communities, as well as industry—aren't always practical. In addition to the new hardware or modifications to existing applications, many depend on centralized control that becomes a bottleneck as the system scales up. If that fails, the whole system goes down.
To avoid the bottleneck, the Michigan team designed a fully decentralized structure. With no centralized entity keeping track of the memory status of all the servers, it doesn't matter how large the computer cluster is. Additionally, Infiniswap does not require designing any new hardware or making modifications to existing applications.
"We've rethought the well-known remote memory paging problem in the context of RDMA," Chowdhury said.
The research team tested Infiniswap on a 32-machine RDMA cluster with workloads from data-intensive applications that ranged from in-memory databases such as VoltDB and Memcached to popular big data software Apache Spark, PowerGraph and GraphX.
They found that Infiniswap improves by an order of magnitude both "throughput"—the number of operations performed per second—and "tail latency"—the speed of the slowest operation. Throughput rates improved between 4 and 16 times with Infiniswap, and tail latency by a factor of 61.
"The idea of borrowing memory over the network if your disk is slow has been around since the 1990s, but network connections haven't been fast enough," Chowdhury said. "Now, we have reached the point where most data centers are deploying low-latency RDMA networks of the type previously only available in supercomputing environments."
Infiniswap is being actively developed by U-M computer science and engineering graduate students Juncheng Gu, Youngmoon Lee and Yiwen Zhang, under the guidance of Chowdhury and Kang Shin, professor of electrical engineering and computer science.
The research that led to Infiniswap was funded by the National Science Foundation, Office of Naval Research and Intel. A recent paper on Infiniswap, titled "Efficient Memory Disaggregation with Infiniswap," was presented at the USENIX Symposium on Networked Systems Design and Implementation in March.
Original article: by Sue Carney
Testimonial
"Our marketing partnership with I-Connect007 is already delivering. Just a day after our press release went live, we received a direct inquiry about our updated products!"
Rachael Temple - AlltematedSuggested Items
Soaring Inference AI Demand Triggers Severe Nearline HDD Shortages; QLC SSD Shipments Poised for Breakout in 2026
09/16/2025 | TrendForceTrendForce’s latest investigations reveal that the massive data volumes generated by AI are straining the global infrastructure of data center storage.
Advanced Packaging-to-Board-Level Integration: Needs and Challenges
09/15/2025 | Devan Iyer and Matt Kelly, Global Electronics AssociationHPC data center markets now demand components with the highest processing and communication rates (low latencies and high bandwidth, often both simultaneously) and highest capacities with extreme requirements for advanced packaging solutions at both the component level and system level. Insatiable demands have been projected for heterogeneous compute, memory, storage, and data communications. Interconnect has become one of the most important pillars of compute for these systems.
Procense Raises $1.5M in Seed Funding to Accelerate AI-Powered Manufacturing
09/11/2025 | BUSINESS WIREProcense, a San Francisco-based industrial automation startup developing cutting-edge AI and remote sensing technologies for process manufacturers has raised $1.5 million in a seed funding round led by Kevin Mahaffey, Business Insider’s #1 seed investor of 2025 and HighSage Ventures, a Boston-based family office that primarily invests in public and private companies in the global software, internet, consumer, and financial technology sectors.
Zuken Announces E3.series 2026 Release for Accelerated Electrical Design and Enhanced Engineering Productivity
09/10/2025 | ZukenZuken reveals details of the upcoming 2026 release of E3.series, which will introduce powerful new features aimed at streamlining electrical and fluid design, enhancing multi-disciplinary collaboration, and boosting engineering productivity.
AI Infrastructure Boosts Global Semiconductor Revenue Growth to 17.6% in 2025
09/09/2025 | IDCAccording to the Worldwide Semiconduct o r Technology and Supply Chain Intelligence service from International Data Corporation (IDC), worldwide semiconductor revenue is expected to reach $800 billion in 2025, growing 17.6% year-over-year from $680 billion in 2024. This follows a strong rebound in 2024, when revenue grew by 22.4% year-over-year.