Distributed systems are hard. While we learn a lot about making highly available systems, we often overlook resiliency in system design. Sure we have heard about fault-tolerant, but what is “resilience” now? Personally, I like to define it a system’s ability to handle and eventually recover from unexpected conditions. There are several ways to go about making your systems resilient to failure, but in this post, we will focus on following