Downsample
Normalizes and/or reduces the resolution of the source data. For example, if data comes in every second and you want to plot a week of data, that would be too many data points for most graph libraries to handle. Instead, downsample the data to emit a value every hour and it will be much more legible as well as making the query faster as there is less data to serialize and less data to work on upstream of the downsample node.
The 3.0 downsampler offers some new features including auto downsampling, infectious NaNs as well as interpolation and fill policies.
Fields for the downsample config include:
Name |
Data Type |
Required |
Description |
Default |
Example |
---|---|---|---|---|---|
interval |
String |
Required |
The new resolution (or size of buckets) to convert the data to. Formated as a TSDB duration. May also be |
null |
1m |
aggregator |
String |
Required |
The ID of a registered aggregation function in the Registry to use when merging multiple values into a downsample bucket. |
null |
sum |
infectiousNan |
boolean |
Optional |
Whether or not NaNs from the source data should infect each bucket when aggregating values. E.g. if one value out of 20 are |
false |
true |
runAll |
Boolean |
Optional |
Whether or not to merge all of the values for the query span into a single resulting bucket. |
false |
true |
fill |
boolean |
Optional |
Whether or not to fill empty buckets with values based on the |
false |
true |
minInterval |
String |
Optional |
NOTE This is temporary, we’ll clean it out and use |
null |
5m |
reportingInterval |
String |
Optional |
When known, the reporting interval of the metric coming through and is used by some query nodes to compute more accurate values, e.g. the rate to count functon. |
null |
5m |
timeZone |
String |
Optional |
An optional Java time zone ID that, when given, switches the downsampling to calendar mode where bucket boundaries are computed based on the time zone, e.g. accounting for daylight savings and offsets from UTC. |
null |
America/Denver |
interpolatorConfig |
List |
Required for now |
A list of interpolator configs for the downsampler to deal with empty buckets. |
null |
See Interpolators |
Note
When a downsample node is present in a query graph and the output is the standard V3 serializer, a timeSpecification
will be present in the output and the values will be serialized in an array without timestamps.
Example:
{
"id": "cpu_ds",
"type": "downsample",
"aggregator": "sum",
"interval": "5m",
"fill": true,
"interpolatorConfigs": [{
"dataType": "numeric",
"fillPolicy": "NAN",
"realFillPolicy": "NONE"
}],
"sources": ["m1"]
}
timeSpecification
When downsampling is present, a time specifications will be serialized in the output of the query, saving time and bytes over the wire as the query no longer needs to serialize timestamps. The time specification in a v3 query output has the following fields:
Name |
Description |
---|---|
start |
The first timestamp in Unix Epoch seconds (or milliseconds if requested). |
end |
The last timestamp in Unix Epoch seconds (or milliseconds if requested). |
intervalISO |
The interval of the downsample in ISO format. |
interval |
The interval as a TSDB duration. |
timeZone |
The timezone of the downsampler. |
units |
The units of the downsample interval. |
Note
When representing data in a plot or with a timestamp, if the timeZone
is NOT equal to UTC, make sure to use a library to add the interval to the start for each bucket. This make sure the results will line up with daylight savings changes, etc.
Auto Intervals
When auto
is used in the interval and configured in the TSD, the TSD will take the start and end time of the query and compute a downsample that would return at most about 800 data points for the query range. A stepping configuration is used to determine the final resolution. By default, this stepping config is:
< 12h use
1m
>= 12h and < 3d use
15m
>= 3d and < 1w use
1h
>= 1w and < 1 month use
6h
>= 1 month < 1 year use
1d
>= 1 year use
1w
To override this configuration, use the tsd.query.downsample.auto.config
property. An example looks like:
# ---------- DOWNSAMPLE ----------
tsd.query.downsample.auto.config:
75d: 1d
2n: 4h
1n: 2h
1w: 1h
2d: 10m
1d: 5m
0: 1m
Where the configuration is a map where the key is the query interval and the value is the downsample interval to use. E.g. for queries from 1 second to 4 hours, use 1m
as the interval. For 1 day to 2 days, use 5m
as the interval. Anything greater than or equal to 75 days will use 1d
as the interval.
Example
{
"start": "1h-ago",
"executionGraph": [{
"id": "m1",
"type": "TimeSeriesDataSource",
"metric": {
"type": "MetricLiteral",
"metric": "sys.if.in"
}
},
{
"id": "ds1",
"type": "downsample",
"aggregator": "sum",
"interval": "1m",
"runAll": false,
"fill": true,
"interpolatorConfigs": [{
"dataType": "numeric",
"fillPolicy": "NAN",
"realFillPolicy": "NONE"
}],
"sources": ["m1"]
}
]
}