Definitions
When it comes to timeseries data, there are lots of terms tossed about that can lead to some confusion. This page is a sort of glossary that helps to define words related to the use of OpenTSDB.
Cardinality
Cardinality is a mathematical term defined as the number of elements in a set. In database lingo, it’s often used to refer to the number of unique items in an index. With regards to OpenTSDB it can refer to:
The number of unique time series for a given metric
The number of unique tag values associated with a tag name
Due to the nature of the OpenTSDB storage schema, metrics with higher cardinality may take longer return results during query execution than those with lower cardinality. E.g. we may have metric foo
with the tag name datacenter
and there are 100 possible values for datacenter. Then we have metric bar
with the tag host
and 50,000 possible values for host. Metric bar
has a higher cardinality than foo
: 50,000 possible time series for bar
an only 100 for foo
.
Compaction
An OpenTSDB compaction takes multiple columns in an HBase row and merges them into a single column to reduce disk space. This is not to be confused with HBase compactions where multiple edits to a region are merged into one. OpenTSDB compactions can occur periodically for a TSD after data has been written, or during a query.
Data Point
Each of the metrics above can be recorded as a number at a specific time. For example, we could record that Sue worked 8 hours at the end of each day. Or that “mylogo.jpg” was downloaded 400 times in the past hour. Thus a datapoint consists of:
A metric
A numeric value
A timestamp when the value was recorded
One or more sets of tags
Metric
A metric is simply the name of a quantitative measurement. Metrics include things like:
hours worked by an employee
webserver downloads of a file
snow accumulation in a region
Note
Notice that the metric
did not include a specific number or a time. That is becaue a metric
is just a label of what you are measuring. The actual measurements are called datapoints
, as you’ll see later.
Unfortunately OpenTSDB requires metrics to be named as a single, long word without spaces. Thus metrics are usually recorded using “dotted notation”. For example, the metrics above would have names like:
hours.worked
webserver.downloads
accumulation.snow
Tags
A metric
should be descriptive of what is being measured, but with OpenTSDB, it should not be too specific. Instead, it is better to use tags
to differentiate and organize different items that may share a common metric. Tags are pairs of words that provide a means of associating a metric with a specific item. Each pair consists of a tagk
that represents the group or category of the following tagv
that represents a specific item, object, location or other noun.
Expanding on the metric examples above:
A business may have four employees, Sue, John, Kelly and Paul. Therefore we may configure a
tagk
ofemployee
with their names as thetagv
. These would be recorded asemployee=sue
,employee=john
etc.Webservers usually have many files so we could have a
tagk
offile
to arrive atfile=logo.jpg
orfile=index.php
Snow falls in many regions so we may record a
tagk
ofregion
to getregion=new_england
orregion=north_west
Time Series
A collection of two or more data points for a single metric and group of tag name/value pairs.
Timestamp
Timestamps are simply the absolute time when a value for a given metric was recorded.
Value
A value represents the actual numeric measurement of the given metric. One of our employees, Sue, worked 8 hours yesterday, thus the value would be 8
. There were 1,024 downloads of logo.jpg
from our webserver in the past hour. And 12 inches of snow fell in New England today.