ASPLOS’19 – Software-Defined Far Memory for Warehouse-Scale Computers

Hi. I am Google’s WaveNet TTS technology and
on behalf of the authors, I am happy to summarize the work at Google on software-defined far
memory. DRAM is a critical bottleneck for scaling
up warehouse-scale computers. The end of Moore’s law has slowed down the
cost per GB reduction of the DRAM technology. However, DRAM demand has been increasing with
growth of in-memory big-data applications. As a solution to this problem, we present
a software-based system that creates a slower but cheaper memory tier, or far memory.
Our end-to-end system design adapts to workload churn in warehouse-scale computers, while
meeting their performance targets. Introducing a far memory tier in modern warehouse-scale
computers comes with a unique set of challenges. First, warehouse-scale computers are sensitive
to performance per dollar. Slowing down the application performance,
even by a few percentages, may offset the cost savings from far memory.
Second, the diverse nature of applications in warehouse-scale computers makes per-application
optimization for far memory impractical. Third, the behavior of warehouse-scale computers
is highly dynamic, causing variations in the optimal ratio of far vs. near memory. To this end, we propose and implement a system
that proactively compresses idle memory pages in background, to create a far memory tier
in software. The software-defined nature makes it possible
to dynamically resize its capacity based on application behavior and enables fast deployment
to the entire Google’s datacenter in just a few weeks.
At the same time, it provides performance that is competitive to hardware-based solutions,
for example, a single-digit microseconds of access latency at tail. In our presentation, we will cover the key
features of our system design, including an algorithm that identifies cold memory pages
with a strict service-level objective definition, operating system and node agent implementation,
and a machine learning based autotuner that continuously optimizes the system based on
past behavior. We look forward to having you join us at 3:30pm
on April 15th, Session 2: VM/Memory at ASPLOS 2019.

Leave a Reply

Your email address will not be published. Required fields are marked *