Managing Risks as a Site Reliability Engineer (SRE)

Liz Fong-Jones and Seth Vargo are back with Season 2 of the DevOps vs. SRE series. In this video, Liz and Seth discuss Risk Analyses and how SREs use a Risk Analysis to understand how likely a service is to exceed the agreed-upon Error Budget. Liz shares anecdotes about common ways her systems fail, like monthly database backups or influxes of commits, while Seth explains how to build a Risk Analysis and how to prioritize reducing the risks using that analysis. Finally, Seth offers some tips about what to do if you can’t mitigate enough risks, including revisiting the Error Budget with key stakeholders.

Risk Analyses tie closely into the DevOps principle of expecting failure and practicing a blameless culture. This is why we say “class SRE implements DevOps”.

Duration: 4:50
Publisher: Google Cloud
