Managing Risks as a Site Reliability Engineer (SRE)


Liz Fong-Jones and Seth Vargo are back with Season 2 of the DevOps vs. SRE series. In this video, Liz and Seth discuss Risk Analyses and how SREs use a Risk Analysis to understand how likely a service is to exceed the agreed-upon Error Budget. Liz shares anecdotes about common ways her systems fail, like monthly database backups or influxes of commits, while Seth explains how to build a Risk Analysis and how to prioritize reducing the risks using that analysis. Finally, Seth offers some tips about what to do if you can’t mitigate enough risks, including revisiting the Error Budget with key stakeholders.

Risk Analyses tie closely into the DevOps principle of expecting failure and practicing a blameless culture. This is why we say “class SRE implements DevOps”.

Have questions? Reach out to Liz and Seth on Twitter:
@sethvargo – twitter.com/sethvargo
@lizthegrey – twitter.com/lizthegrey

Site Reliability Engineering → http://landing.google.com/sre
SRE Book and Workbook → http://bit.ly/2LAmfIz

Watch more episodes here → http://bit.ly/2PPL6f0
Subscribe to the channel → http://bit.ly/GCloudPlatform


Duration: 4:50
Publisher: Google Cloud
You can watch this video also at the source.