Content Deep Dive
Why Use K-Means for Time Series Data? (Part Two)
Blog post from InfluxData
Post Details
Company
Date Published
Author
Anais Dotis-Georgiou
Word Count
1,338
Language
English
Hacker News Points
-
Summary
K-Means is used for anomaly detection in time series data by first windowing the data into segments, then clustering these segments using K-Means. The centroids of the clusters represent different shapes or polynomials that the data takes. By analyzing the shape of each cluster and its position in the 32-dimensional space, it's possible to detect anomalies in the data. However, K-Means has limitations, such as only converging on local minima, which can lead to poor clustering and predictions if initial centroids are placed poorly. Additionally, using the Euclidean distance as a similarity measure can be misleading, especially when dealing with non-uniform time-steps or sensor data.