Anomaly Detection
Forecasting or anomaly detection is useful when working over time series data that involves “seasonality” or cyclical patterns usually related to human behavior, e.g. people streaming content at the end of a work day. OpenTSDB supports plugins that can analyze stored time series data and process it through algorithms to predict what should happen next. Users can then plot the data to compare against real values or alerting systems can notify users when thresholds have been exceeded.
Plugins available for anomaly processing include:
Query Flow
Predicting the future behavior of time series data is a non-trivial task whether it’s via a simple statistical algorithm or training machine learning models. To compute the forecast, a fair amount of historical data is needed, ranging from weeks to possibly years or more of information. Caching plays an important role in maintaining the performance of the query layer, thus there are multiple modes of anomaly query execution. OpenTSDB has support for various caching systems and strategies. See _TODO_
Query Modes
In the query config, these are the possible values for the mode
field.
CONFIG
In this mode, the prediction is not cached as it is intended for a user with a UI to modify algorithm settings until they find a fit that works for their use case. Each time a query is made the historical data is fetched and passed through the algorithms, thus this can be a heavy query.
A prediction for the entire query range is generated using historical data. The prediction can then be compared against the current data in the same query and anomalies can be serialized.
In the future we expect to cache the historical data (as long as the base query is not modified) for performance improvements but we’ll still have to train with the new settings.
EVALUATE
This is used for alerting where the prediction cache is read and/or populated for future calls. Similar to the config call, historical data is fetched on a prediction cache miss. Caches are populated for a day or an hour of data in segments. Current data within the query time range is compared against the predictions and anomalies detected can be serialized.
PREDICT
This mode can be used to pre-populate the cache instead of relying on the evaluate command. This is useful for very expensive queries where an evaluate call may timeout on a cache miss. (In that case the query keeps running in the background and the next call with an evaluate will hopfully find the cache populated.) Predict queries will return a 204 (or empty data set, TODO verify this).
Common Semantic Query Fields
Currently these anomaly algorithms are only supported in the semantic query layer and configuration nodes must be added to the execution graph. The following fields are common across all implementations.
Name |
Data Type |
Required |
Description |
Default |
Example |
---|---|---|---|---|---|
mode |
String |
Required |
One of |
null |
CONFIG |
trainingInterval |
String |
Optional |
A TSDB resolution that determines how far back in time to fetch historical data to train the model with. |
null |
3w |
serializeObserved |
Boolean |
Optional |
Whether or not to serialize the current or observed data. The metric and tag sets are unmodified. |
false |
true |
serializeThresholds |
Boolean |
Optional |
Whether or not to serialize time series with computed thresholds applied. The |
false |
true |
serializeAlerts |
Boolean |
Optional |
Whether or not to serialize the anomalies. See the section below. |
false |
true |
serializeDeltas |
Boolean |
Optional |
Whether or not to serialize the delta of the observed data from the predicted data. Tags include the |
false |
true |
upperThresholdBad |
Numeric |
Optional |
For anomaly detection, a numeric threshold that, when the delta of observed and predicted data exceeds the threshold, a |
null |
25 |
upperThresholdWarn |
Numeric |
Optional |
For anomaly detection, a numeric threshold that, when the delta of observed and predicted data exceeds the threshold, a |
null |
15 |
upperIsScalar |
Boolean |
Optional |
When true, the upper bad and warn thresholds are considered as absolute values above the prediction. When false, the thresholds are considered percentages. |
false |
true |
lowerThresholdBad |
Numeric |
Optional |
For anomaly detection, a numeric threshold that, when the delta of observed and predicted data is lower than the threshold, a |
null |
25 |
lowerThresholdWarn |
Numeric |
Optional |
For anomaly detection, a numeric threshold that, when the delta of observed and predicted data is lower than the threshold, a |
null |
15 |
lowerIsScalar |
Boolean |
Optional |
When true, the lower bad and warn thresholds are considered as absolute values below the prediction. When false, the thresholds are considered percentages. |
false |
true |
Output
The output of an anomaly plugin will often consist of multiple time series even if only one is fed into the node. By default the prediction is serialized and the metric name is modified with a suffix of .prediction
and a tag is added with the key _anomalyModel
and a value of the model used, e.g. Prophet
.
If serializeAlerts
is enabled, and AlertType
is emitted in the same result set as the prediction. This a list of time stamps where the observed data exceeded the configured thresholds against the prediction. For example:
"1611860520": {
"level": "BAD",
"message": "** TEMP 1.8054497E7 is greater than 1.7681267183127187E7 which is > than 15.0%",
"value": 1.8054497E7,
"threshold": 1.7681267183127187E7,
"type": "upperBad"
}
Field definitions:
Name |
Data Type |
Description |
---|---|---|
level |
String |
The threshold level, either |
message |
String |
A message that can be sent to the end user. Note that right now it’s prefixed by |
value |
Numeric |
The observed value |
threshold |
Numeric |
The computed threshold based off the prediction. |
type |
String |
The thresold exceeded. If a |