Breakthrough for Large Scale Computing: 'Memory Disaggregation' Made Practical
May 29, 2017 | University of Michigan RegentsEstimated reading time: 3 minutes
For decades, operators of large computer clusters in both the cloud and high-performance computing communities have searched for an efficient way to share server memory in order to speed up application performance.
Now a newly available open-source software developed by University of Michigan engineers makes that practical.
The software is called Infiniswap, and it can help organizations that utilize Remote Direct Memory Access networks save money and conserve resources by stabilizing memory loads among machines. Unlike its predecessors, it requires no new hardware and no changes to existing applications or operating systems.
Infiniswap can boost the memory utilization in a cluster by up to 47 percent, which can lead to financial savings of up to 27 percent, the researchers say. More efficient use of the memory the cluster already has means less money spent on additional memory.
"Infiniswap is the first system to scalably implement cluster-wide 'memory disaggregation,' whereby the memory of all the servers in a computing cluster is transparently exposed as a single memory pool to all the applications in the cluster," said Infiniswap project leader Mosharaf Chowdhury, U-M assistant professor of computer science and engineering.
"Memory disaggregation is considered a crown jewel in large scale computing because of memory scarcity in modern clusters."
The software lets servers instantly borrow memory from other servers in the cluster when they run out, instead of writing to slower storage media such as disks. Writing to disk when a server runs out of memory is known as "paging out" or "swapping." Disks are orders of magnitude slower than memory, and data-intensive applications often crash or halt when servers need to page.
Prior approaches toward memory disaggregation—from computer architecture, high-performance computing and systems communities, as well as industry—aren't always practical. In addition to the new hardware or modifications to existing applications, many depend on centralized control that becomes a bottleneck as the system scales up. If that fails, the whole system goes down.
To avoid the bottleneck, the Michigan team designed a fully decentralized structure. With no centralized entity keeping track of the memory status of all the servers, it doesn't matter how large the computer cluster is. Additionally, Infiniswap does not require designing any new hardware or making modifications to existing applications.
"We've rethought the well-known remote memory paging problem in the context of RDMA," Chowdhury said.
The research team tested Infiniswap on a 32-machine RDMA cluster with workloads from data-intensive applications that ranged from in-memory databases such as VoltDB and Memcached to popular big data software Apache Spark, PowerGraph and GraphX.
They found that Infiniswap improves by an order of magnitude both "throughput"—the number of operations performed per second—and "tail latency"—the speed of the slowest operation. Throughput rates improved between 4 and 16 times with Infiniswap, and tail latency by a factor of 61.
"The idea of borrowing memory over the network if your disk is slow has been around since the 1990s, but network connections haven't been fast enough," Chowdhury said. "Now, we have reached the point where most data centers are deploying low-latency RDMA networks of the type previously only available in supercomputing environments."
Infiniswap is being actively developed by U-M computer science and engineering graduate students Juncheng Gu, Youngmoon Lee and Yiwen Zhang, under the guidance of Chowdhury and Kang Shin, professor of electrical engineering and computer science.
The research that led to Infiniswap was funded by the National Science Foundation, Office of Naval Research and Intel. A recent paper on Infiniswap, titled "Efficient Memory Disaggregation with Infiniswap," was presented at the USENIX Symposium on Networked Systems Design and Implementation in March.
Original article: by Sue Carney
Suggested Items
Industrial PC Market Size to Record $1.75 Billion Growth from 2023-2027
05/03/2024 | PRNewswireThe global industrial pc market size is estimated to grow by USD 1.75 billion from 2023 to 2027, according to Technavio. This growth is expected to occur at a Compound Annual Growth Rate (CAGR) of almost 6.29% during the forecast period.
Gartner Survey Finds 61% of Organizations Are Evolving Their D&A Operating Model Because of AI Technologies
05/01/2024 | Gartner, Inc.Sixty-one percent of organizations are forced to evolve or rethink their data and analytics (D&A) operating model because of the impact of disruptive artificial intelligence (AI) technologies, according to a new Gartner, Inc. survey.
Real Time with… IPC APEX EXPO 2024: Operational Excellence and Smart Factory Initiatives
04/30/2024 | Real Time with...IPC APEX EXPOOperational excellence and operational efficiency are defined in this interview with Koh Young General Manager Joel Scutchfield. He touches on automation, AI, and collaboration as solutions to resource limitations. Koh Young's data-driven approach uses AI for process adjustments, data analytics, and supply chain enhancements. The discussion underscores the shift toward smart factory initiatives and the future of manufacturing, with a focus on reshoring, nearshoring, and technology utilization.
IDTechEx Report on Quantum Technology: Nano-scale Physics for Massive Market Impact
04/30/2024 | PRNewswireThe quantum technology market leverages nano-scale physics to create revolutionary new devices for computing, sensing, and communications. Across the industry, quantum technology offers a paradigm shift in performance compared with incumbent solutions.
NASA’s Optical Comms Demo Transmits Data Over 140 Million Miles
04/30/2024 | NASA JPLNASA’s Deep Space Optical Communications experiment also interfaced with the Psyche spacecraft’s communication system for the first time, transmitting engineering data to Earth.