With more than 2800 customers and 60 billion API calls each month reliability is a must at cloud-based search company Algolia. During his presentation at TechSummit Amsterdam, Anthony Seure a Site Reliability Engineer at Algolia spoke about how sleep was the main constraint when the processing pipeline was redesigned. By keeping the system running the Site Reliability Engineers, or Sleep Reinforcement Engineers (SREs) as they preferred to be known, made sure the whole system was highly reliable. As much as the team would have like to call forth Harry Potter’s magic it relied on crafty use of multiple cloud services and removing single points of failure.
You can watch this video also at the source.