Avash's Portfolio

Memcached at Scale- How Facebook Scaled and Optimized for Massive Request Volumes


Memcached at Scale- How Facebook Scaled and Optimized for Massive Request Volumes

Facebook, one of the world’s largest social media platforms, handles billions of requests per second and stores trillions of key-value pairs. To manage this scale, Meta uses Memcached, a simple Key-Value cache that stores data in memory. Since users consume a lot of content than they create caching had significant advantages. It reduced latency, reduced load on database servers, and much more. In this blog, we’ll explore Meta’s strategy for scaling their Memcached clusters to handle their high demand and discuss optimizations in these areas to improve performance and storage.

Optimizing Clusters to Reduce Latency for Memcached

When scaling to thousands of servers per cluster, we run into some interesting problems

So let’s discuss how to solve each issue in detail

Reducing Latency by using TCP for Set and Delete, and UDP for Get Requests

Parallelizing and Batching Requests to reduce latency

Parallelizing requests

Preventing Congestion

Optimizing Clusters to Reduce Load on Servers

Reducing Load (on Databases) using Leases

This is one of my favorite techniques in this entire blog. This mechanism not only reduces load (and thereby prevents thundering herds) but also addresses the issue of stale data and is quite intuitive. But before discussing the solution let’s understand in what scenario will the load on database servers increase.

stale values in cache condition

A lease is a 64-bit token that is associated with a key. When there is a cache miss, a lease is assigned to the web server that experienced the cache miss.

Assigning different Pools for different types of data

Different types of data have different access patterns, memory footprints, and quality of service requirements. They can negatively affect each other so they are assigned different server pools.

E.g. A small pool of Memcached servers can be assigned for the keys that are accessed frequently but a cache miss is cheap (A smaller pool means less time is required to search the key) and a large pool for keys that are accessed infrequently but a cache miss is expensive.

Distributing Read Requests across the Replicas

We can create replicas of the Memcached servers within the same pool and replicate the keys. Now when we get read requests, we can distribute them across all the replicas. This reduces the load from the main Memcached server.

Preventing Cascading Failure

In a distributed system, servers going offline is a norm rather than an exception. To mitigate outages a small set of servers known as Gutter is used. It is just a Memcached server that is only used when any other Memcached server is offline. When the server sends a request to the Memcached server but does not receive any response, it assumes the server is offline and sends the request to the Gutter Pool.

There is another approach, where the key is rehashed and assigned to another server. This is not preferred because it can cause a cascading failure.

cascading failure

Image credits: https://interviewready.io/blog/ratelimiting

For instance, if a highly requested key is assigned to another server, then it might overload that server and crash it. Now, again it’ll be assigned to a different server and repeat the same thing as above.

It’ll continue until many servers crash due to overload, so using Gutter servers is a better option.

Improving Performance by using regions of Memcached servers

There is an upper limit to how many servers we can add to a cluster before the congestion worsens. Therefore it is better to split the web and Memcached servers into multiple frontend clusters. These clusters, along with a storage cluster that contains the databases, define a region.

Making regions of Memcached servers has many benefits like reduced latency because the servers are near the end users. It also mitigates the effects of large-scale power outages, natural calamities, etc. This makes the system more reliable and available.

Removing stale data from Memcached servers in different regions

A single key-value pair may be replicated to multiple Memcached servers in different regions. So, when the data is updated in the database, we have to invalidate the data from those servers. To do so, the following procedure is followed

Optimizing a single Memcached server

A single server can easily become the bottleneck, so it is important to optimize the Memcached server.

Using Adaptive Slab Allocator for Better Memory Management

Allocating and Deallocating memory randomly causes Memory Fragmentation.

Memory fragmentation is when the sum of the available space in a memory heap is large enough to satisfy a memory allocation request but the size of any individual fragment (or contiguous fragments) is too small to satisfy that memory allocation request

memory fragmentation

It increases the read time as the memory gets more and more fragmented. To prevent this, Memcachedpre- allocates a large chunk and divides it into slab classes. The size of slab classes starts from 64 bytes up to 1MB. It stores the data in the smallest possible slab that can fit the data item.

Each slab class maintains a list of available memory and requests for more memory when this free list is empty.

Now, it is worth noting that there might be some wastage of memory inside a slab if the data item is smaller than the slab size (it is called internal fragmentation). But it does not degrade the performance as much as external fragmentation.

Choosing the right eviction policy

Memcached uses LRU (Least Recently Used) eviction policy. Each Slab class has its own LRU data structure (generally we use a linked list). When it cannot allocate any more memory in a slab, the least recently used (in other words, the oldest items) is evicted from the slab class.

This approach has one issue - The eviction rate across slab classes is unbalanced. This can cause performance issues as one slab is constantly evicting and adding data items, while other slabs are not being used. To prevent this, it is important to identify the slabs where the eviction rate is high and the evicted keys can be assigned to other slabs.

So, if a slab class is evicting a data item and it was used at least 20% (this a threshold) more recently than the average of least recently used items in other slab classes, then the data item is moved to any other class.

Reducing Memory Usage by removing short-lived keys

Memcached evicts an item

In short, it removes the data items lazily. Short-lived keys that see a single burst of activity waste memory, until they reach the end of LRU.

To solve this issue, a hybrid method is used to lazily evict most keys and quickly evict short-lived keys when they expire. All the short-lived items are placed in a circular buffer of a linked list. We have a head pointer that iterates over the circular buffer and advances by one each second. Each second, all of the items in the bucket at the head pointer are evicted and the head pointer advances by one.

Using Multi-threading to boost performance

The Memcached server is multi-threaded to boost performance.


In conclusion, Meta’s strategy for scaling their Memcached clusters to handle their high demand involves optimizing clusters and regions of Memcached servers as well as optimizing a single Memcached server. They use techniques such as using TCP for Set and Delete requests, UDP for Get requests to reduce latency, parallelizing and batching requests to minimize round trips, and preventing congestion using a sliding window mechanism. They also use leases to reduce the load on databases and prevent thundering herds, and implement cache invalidation methods to address stale data issues.

These optimizations not only help reduce latency and load on servers, but also prevent cascading failures, thundering herds, and stale data issues, resulting in improved performance and storage for Meta’s Memcached clusters. With these strategies in place, Meta can handle billions of requests per second and store trillions of key-value pairs, efficiently serving content to their users and maintaining a seamless user experience on their social media platform.


References

Memcached White Paper: https://www.usenix.org/system/files/conference/nsdi13/nsdi13-final170_update.pdf

Memcached architecture crash course: https://www.youtube.com/watch?v=NCePGsRZFus