AWS re:Invent 2014 | (PFC305) Embracing Failure: Fault-Injection and Service Reliability

Complex distributed systems fail. They fail more frequently, and in different ways, as they scale and evolve over time. In this session, you learn how Netflix embraces failure to provide high service availability. Netflix discusses their motivations for inducing failure in production, the mechanics of how Netflix does this, and the lessons they learned along the way. Come hear about the Failure Injection Testing (FIT) framework and suite of tools that Netflix created and currently uses to induce controlled system failures in an effort to help discover vulnerabilities, resolve them, and improve the resiliency of their cloud environment.

Duration: 47:16
Publisher: Amazon Web Services
You can watch this video also at the source.