Observability of Distributed Systems


In their previous video, Liz and Seth reduced actionable alerts by focusing on Service Level Objectives (SLOs), but how can we make our systems observable, instead of only being able to debug what we’ve thought to monitor in the past?

In this video, you learn how structured logs, metrics, and traces help SRE and DevOps practitioners find out where the systems are broken. We’ll use metrics to find slow or erroring queries, traces to find interactions between components, and logs to understand the errors in more detail.

To get started with this functionality, Google Cloud offers Stackdriver Service Monitoring.

Reference Links:
Stackdriver Service Monitoring → http://bit.ly/2p5M0Yx
Drill down into Stackdriver → https://bit.ly/2wJdVS7
Improving Reliability with Error Budgets, Metrics, and Tracing in Stackdriver → https://bit.ly/2x8QoKI

Reach out to Liz and Seth:

Watch more episodes from the playlist here → http://bit.ly/2PPL6f0
Subscribe to the Google Cloud Platform channel for more Cloud content → http://bit.ly/GCloudPlatform


Duration: 4:58
Publisher: Google Cloud
You can watch this video also at the source.