Downsample

Normalizes and/or reduces the resolution of the source data. For example, if data comes in every second and you want to plot a week of data, that would be too many data points for most graph libraries to handle. Instead, downsample the data to emit a value every hour and it will be much more legible as well as making the query faster as there is less data to serialize and less data to work on upstream of the downsample node.

The 3.0 downsampler offers some new features including auto downsampling, infectious NaNs as well as interpolation and fill policies.

Fields for the downsample config include:

Name	Data Type	Required	Description	Default	Example
interval	String	Required	The new resolution (or size of buckets) to convert the data to. Formated as a TSDB duration. May also be `0all` if `runAll` is set to true in which case the full query timespan is aggregated into a single value. May also be `auto` for automatic downsampling based on the query span.	null	1m
aggregator	String	Required	The ID of a registered aggregation function in the Registry to use when merging multiple values into a downsample bucket.	null	sum
infectiousNan	boolean	Optional	Whether or not NaNs from the source data should infect each bucket when aggregating values. E.g. if one value out of 20 are `NaN` in a bucket and this is true, the bucket will return a `NaN`. If all values in a bucket are `NaN` then the result will be `NaN` regardless..	false	true
runAll	Boolean	Optional	Whether or not to merge all of the values for the query span into a single resulting bucket.	false	true
fill	boolean	Optional	Whether or not to fill empty buckets with values based on the `interpolatorConfig`. If false, then empty buckets return `null` or `NaN`.	false	true
minInterval	String	Optional	NOTE This is temporary, we’ll clean it out and use `reportingInterval` instead. An optional minimum interval to use when `auto` is set as the primary `interval`. This is useful for metrics that may be reported on a `5m` interval to avoid extraneous fills when downsampling would be set to `1m` with auto based on the query interval.	null	5m
reportingInterval	String	Optional	When known, the reporting interval of the metric coming through and is used by some query nodes to compute more accurate values, e.g. the rate to count functon.	null	5m
timeZone	String	Optional	An optional Java time zone ID that, when given, switches the downsampling to calendar mode where bucket boundaries are computed based on the time zone, e.g. accounting for daylight savings and offsets from UTC.	null	America/Denver
interpolatorConfig	List	Required for now	A list of interpolator configs for the downsampler to deal with empty buckets.	null	See Interpolators

Note

When a downsample node is present in a query graph and the output is the standard V3 serializer, a timeSpecification will be present in the output and the values will be serialized in an array without timestamps.

Example:

{
  "id": "cpu_ds",
  "type": "downsample",
  "aggregator": "sum",
  "interval": "5m",
  "fill": true,
  "interpolatorConfigs": [{
    "dataType": "numeric",
    "fillPolicy": "NAN",
    "realFillPolicy": "NONE"
  }],
  "sources": ["m1"]
}

timeSpecification

When downsampling is present, a time specifications will be serialized in the output of the query, saving time and bytes over the wire as the query no longer needs to serialize timestamps. The time specification in a v3 query output has the following fields:

Name	Description
start	The first timestamp in Unix Epoch seconds (or milliseconds if requested).
end	The last timestamp in Unix Epoch seconds (or milliseconds if requested).
intervalISO	The interval of the downsample in ISO format.
interval	The interval as a TSDB duration.
timeZone	The timezone of the downsampler.
units	The units of the downsample interval.

Note

When representing data in a plot or with a timestamp, if the timeZone is NOT equal to UTC, make sure to use a library to add the interval to the start for each bucket. This make sure the results will line up with daylight savings changes, etc.

Auto Intervals

When auto is used in the interval and configured in the TSD, the TSD will take the start and end time of the query and compute a downsample that would return at most about 800 data points for the query range. A stepping configuration is used to determine the final resolution. By default, this stepping config is:

< 12h use 1m
>= 12h and < 3d use 15m
>= 3d and < 1w use 1h
>= 1w and < 1 month use 6h
>= 1 month < 1 year use 1d
>= 1 year use 1w

To override this configuration, use the tsd.query.downsample.auto.config property. An example looks like:

# ---------- DOWNSAMPLE ----------
tsd.query.downsample.auto.config:
  75d: 1d
  2n: 4h
  1n: 2h
  1w: 1h
  2d: 10m
  1d: 5m
  0: 1m

Where the configuration is a map where the key is the query interval and the value is the downsample interval to use. E.g. for queries from 1 second to 4 hours, use 1m as the interval. For 1 day to 2 days, use 5m as the interval. Anything greater than or equal to 75 days will use 1d as the interval.

Example

{
    "start": "1h-ago",
    "executionGraph": [{
                    "id": "m1",
                    "type": "TimeSeriesDataSource",
                    "metric": {
                            "type": "MetricLiteral",
                            "metric": "sys.if.in"
                    }
            },
            {
                    "id": "ds1",
                    "type": "downsample",
                    "aggregator": "sum",
                    "interval": "1m",
                    "runAll": false,
                    "fill": true,
                    "interpolatorConfigs": [{
                            "dataType": "numeric",
                            "fillPolicy": "NAN",
                            "realFillPolicy": "NONE"
                    }],
                    "sources": ["m1"]
            }
    ]
}