Artificial intelligence for IT operations (AIOps) is a way to automate tasks that are typically carried out by site reliability engineers (SREs). It aims to make the lives of SREs easier by helping them reduce the amount of noise coming from systems, surface issues more easily, and perform root cause analysis by correlating data from different systems.
In this video, we discuss using AIOps and machine learning for root cause analysis, specifically looking at how to find the source or origin of an issue in our systems. We use the OpenTelemetry demo with 11 microservices, which includes a feature flag and flags UI that allows us to trigger a problem when fetching a specific product in a store. By enabling the failure and letting it run for a certain amount of time, we can see how the failure propagates through the system using distributed tracing in the APM. We also show how to use machine learning to find correlations between latency and failing transactions to pinpoint the exact product causing the issue.
00:00 – Root Cause Analysis / Introduction
02:11 – Simulating a failure
03:20 – APM Service Map
05:10 – Failure in Product Catalog Service
06:00 – Latency Distribution
06:38 – Failed Transactions Correlation
08:40 – Confirming the Root Cause
10:30 – Conclusion
– Learn why Elastic Observability was recognized as a Strong Performer in the Forrester Wave AIOps, Q4 2022 report: https://www.elastic.co/explore/devops-observability/forrester-research-wave-aiops-report
– Take a deeper dive into distributed tracing: https://www.elastic.co/guide/en/apm/guide/current/apm-distributed-tracing.html
– Learn more about AIOps: https://www.elastic.co/observability/aiops
– Learn more about APM: https://www.elastic.co/observability/application-performance-monitoring
– Learn more about Elastic Observability: https://www.elastic.co/observability/
Start the 14-day trial for free! No credit card required: https://cloud.elastic.co/registration?elektra=en-ess-sign-up-page
Subscribe to Elastic’s Community YT channel: https://www.youtube.com/c/OfficialElasticCommunity
Connect with us on social media:
Elastic is the leading platform for search-powered solutions, and we help everyone — organizations, their employees, and their customers — find what they need faster, while keeping applications running smoothly, and protecting against cyber threats. When you tap into the power of Elastic Enterprise Search, Observability, and Security solutions, you’re in good company with brands like Netflix, Uber, Slack, Microsoft, and thousands of others who rely on us to accelerate results that matter.
#DistributedTracing #AIOps #Observability #DevOps #ElasticObservability #RootCauseAnalysis #MachineLearning #APM
You can watch this video also at the source.