Datadog, a monitoring service for dynamic cloud environments, announced the release of a new machine-learning based feature called Anomaly Detection. This would allow engineering teams to quickly identify abnormal behavior within rapidly changing cloud environments, based on historical patterns that are impossible to track manually.
Anomaly Detection works by constantly analyzing historical application performance data, in order to evaluate whether the current state is to be expected by comparison. This is different from traditional monitoring solutions that pre-define what should be considered normal behavior of the application, without taking into consideration seasonality and trends.
Application throughput, web requests, user logins and other top-level metrics all have pronounced peaks and valleys, and those fluctuations would make it difficult to manually set sensible thresholds for alerting or investigation.
“We are analyzing nearly a trillion data points every day coming from some of the largest companies in the world,” said Homin Lee, Lead Data Scientist, Datadog. “Our algorithms are rooted in classic statistical models but have been heavily adapted and optimized by Datadog for monitoring cloud applications.”
Another existing function of Datadog’s alerting engine is a feature called Outlier Detection, which triggers an alert when a server is behaving significantly different from its counterparts at a given moment. Combined with the algorithmic alerting of Anomaly Detection and Datadog’s dashboarding technology, engineering teams would gain deep insights into how their application is performing.
“It can be challenging to manually configure alerts for metrics which change throughout the day, week, or year,” said Igor Serebryany, Developer Happiness Engineer, Airbnb. “Anomaly detection helps us respond to issues more quickly, while avoiding needlessly paging our engineers.”