OpenTSDB - A Distributed, Scalable Monitoring System

How does OpenTSDB work?

OpenTSDB consists of a Time Series Daemon (TSD) as well as set of command line utilities. Interaction with OpenTSDB is primarily achieved by running one or more of the TSDs. Each TSD is independent. There is no master, no shared state so you can run as many TSDs as required to handle any load you throw at it. Each TSD uses the open source database HBase or hosted Google Bigtable service to store and retrieve time-series data. The data schema is highly optimized for fast aggregations of similar time series to minimize storage space. Users of the TSD never need to access the underlying store directly. You can communicate with the TSD via a simple telnet-style protocol, an HTTP API or a simple built-in GUI. All communications happen on the same port (the TSD figures out the protocol of the client by looking at the first few bytes it receives).

OpenTSDB architecture

Writing

The first step in using OpenTSDB is to send time series data to the TSDs. A number of tools exist to pull data from various sources into OpenTSDB. If you can't find a tool for your needs, you may need to write scripts that collect data from your systems (e.g. by reading interesting metrics from /proc on Linux, collecting counters from your network gear via SNMP, or other interesting data from your applications, via JMX for instance for Java applications) and push data points to one of the TSDs periodically.

StumbleUpon wrote a Python framework called tcollector that is used to collect thousand of metrics from Linux 2.6, Apache's HTTPd, MySQL, HBase, memcached, Varnish and more. This low-impact framework includes a number useful collectors and the community is constantly providing more. Alternative frameworks with OpenTSDB support include Collectd, Statsd and the Coda Hale metrics emitter..

In OpenTSDB, a time series data point consists of:

A metric name.
A UNIX timestamp (seconds or millisecinds since Epoch).
A value (64 bit integer or single-precision floating point value), a JSON formatted event or a histogram/digest.
A set of tags (key-value pairs) that describe the time series the point belongs to.

Tags allow you to separate similar data points from different sources or related entities, so you can easily graph them individually or in groups. One common use case for tags consists in annotating data points with the name of the machine that produced it as well as name of the cluster or pool the machine belongs to. This allows you to easily make dashboards that show the state of your service on a per-server basis as well as dashboards that show an aggregated state across logical pools of servers.

mysql.bytes_received 1287333217 327810227706 schema=foo host=db1
mysql.bytes_sent 1287333217 6604859181710 schema=foo host=db1
mysql.bytes_received 1287333232 327812421706 schema=foo host=db1
mysql.bytes_sent 1287333232 6604901075387 schema=foo host=db1
mysql.bytes_received 1287333321 340899533915 schema=foo host=db2
mysql.bytes_sent 1287333321 5506469130707 schema=foo host=db2

This examples contains 6 data points that belong to 4 different time series. Each different combination of metric and tags makes up a different time series. All of the 4 time series are for one of two metrics mysql.bytes_received or mysql.bytes_sent. A data point must have at least one tag and every time series for a metric should have the same number of tags. It is not recommended to have more than 6-7 tags per data point, as the cost associated with storing new data points quickly becomes dominated by the number of tags beyond that point.

With the tags in the example above, it will be easy to create graphs and dashboards that show the network activity of MySQL on a per host and/or per schema basis. New to OpenTSDB 2.0 is the ability to store non-numeric annotations along with data points for tracking meta-data, quality metrics or other types of information.

Reading

Time series data is usually consumed in the format of a line graph. Thus OpenTSDB offers a built-in, simple user interface for selecting one or more metrics and tags to generate a graph as an image. Alternatively an HTTP API is available to tie OpenTSDB into external systems such as monitoring frameworks, dashboards, statistics packages or automation tools.

Take a look at the resources page for tools contributed by the community for working with OpenTSDB.