FAQ - OpenTSDB - A Distributed, Scalable Monitoring System

Scalability

Can OpenTSDB scale to multiple data centers?

Yes. It is recommended that you run one set of Time Series Daemons (TSDs) per HBase cluster and one HBase cluster per physical datacenter. It is not recommended to have HBase clusters spanning across data centers. Instead you can use HBase replication to replicate tables across data centers.

How much write throughput can I get with OpenTSDB?

It depends mostly on two things:

The size of your HBase cluster.
The CPUs you're using.

If your HBase cluster is reasonably sized, it's unlikely that OpenTSDB will max it out as the TSDs tend to be CPU bound before that happens (unless you run many TSDs). A TSD can easily handle 2000 new data points per second per core on an old dual-core Intel Xeon CPU from 2006. More modern CPUs will get you more throughput.

How much read throughput can I get with OpenTSDB?

Read throughput varies depending on the cardinality of a metric (how many distinct time series exist), the time span and the number of data points retreived. Lower cardinality with fewer data points will execute much quicker than higher cardinality and greater data point queries. Most queries for the last day of data will return in less than a second with low cardinality. However huge queries can run for multiple seconds. We're working to optimize the query path.

What type of hardware should I run the TSDs on?

There are no strict requirements. The recommended configuration, however, is a 4-core machine with at least 4GB of RAM, and a tmpfs partition for the cache directory used by the TSD. Having more RAM helps the TSD ride over transient HBase outages by allowing it to buffer more incoming data before getting to the point where it must start discarding data.

HBase region servers are usually beefy machines and many OpenTSDB users run their TSDs on the same machines, granted there is enough memory for both processes.

How much disk space do I need?

The answer depends mostly on the average number of tags per data point. StumbleUpon uses 4.5 tags on average and 100+ billion data points take only just over a terabyte of disk space (pre-HDFS 3x replication). Enabling compression with LZO or Snappy is extremely recommended in a production setting. In Stumbleupon's case, each data point ends up taking about 12 bytes of disk space (or actually 36 if you include the 3x replication factor of HDFS). We also find that, on average, LZO is able to achieve a compression factor of 4.2x on the TSD table, but your mileage will vary. Without LZO, a data point costs roughly: 16 bytes of HBase overhead, 3 bytes for the metric, 4 bytes for the timestamp, 6 bytes per tag, 2 bytes of OpenTSDB overhead, up to 8 bytes for the value. Integers are stored with variable length encoding and can consume 1, 2, 4 or 8 bytes.

Reliability

What are the Single Points of Failure of OpenTSDB?

OpenTSDB itself doesn't have any specific SPoF as there is no central component and you can run multiple TSDs on different machines. The TSDs need HBase to run, and HBase doesn't have any SPoF^* either as HBase only really needs a ZooKeeper quorum to keep serving. A ZooKeeper quorum is typically made of 5 different machines, out of which you can afford to lose 2 before the system goes down. Note that although HBase has a master, it is not actually needed for HBase to keep serving. Not having a master running will prevent HBase from starting or recovering from machine failures but, in a steady state, losing the master doesn't impede on HBase's ability to serve.

^* Fine prints: if your HBase cluster is backed by HDFS, which is most likely the case for production clusters at the time of this writing, then you have a SPoF because of the NameNode of HDFS. If you run HBase on top of a reliable distributed filesystem, then you don't have any SPoF.

What are the failure modes of OpenTSDB?

The TSD eventually becomes unhealthy when HBase itself is down or broken for an extended period of time. Right now, the TSD doesn't handle prolonged HBase outages very well and will discard incoming data points once its buffers are full if it's unable to flush them to HBase. Future versions will temporarily store data to local disk when it is unable to reach HBase.

StumbleUpon has had a number of cases where a collector that runs on hundreds of machines goes crazy and generate a DDoS on the TSDs. The TSD doesn't do a good job at handling such DDoS situations by penalizing offending clients, so its performance will degrade once the machine it's running on is unable to keep up with the load.

What is the recommended deployment for OpenTSDB?

We recommend that you run multiple TSDs behind a load balancer such as Varnish, HAProxy or DNS round robin. StumbleUpon found it useful to dedicate one or more TSDs for read queries (human users using the web UI to generate graphs or to view dashboards) and let other TSDs handle the write queries (new data points coming in from production machines). For the "read-only" TSDs, we recommend Varnish for load balancing. Read more about Varnish and TSDs.

What data durability guarantees does OpenTSDB make?

By default the TSD buffers data points for about 1 second before persisting them in HBase (configurable via the --flush-interval flag). If the TSD was to crash without getting a chance to run its shutdown hook, you could lose up to 1 second worth of data points. In practice we've found this trade off to be acceptable given the performance benefits that deferred flushes offer in terms of write throughput. Once a data point has been stored in HBase, data durability is guaranteed if you're running HBase on top of a distributed filesystem that provides the necessary data durability guarantees.

If you use HDFS, we recommend that you run Cloudera's Distribution for Hadoop (CDH), version 3 or above preferably, as this version comes with all the necessary patches to make HDFS less unreliable and has better performance.

Data Model

How can I increment a counter in OpenTSDB?

OpenTSDB does not have a counter feature at this time, though work is underway. Currently OpenTSDB simply records

(timestamp,
value)

pairs. Data points are independent from each other. Say you want to keep track of clicks on an ad in OpenTSDB. You wouldn't send a "+1" to the TSD for every click. Instead, if your application doesn't already keeps track of click counts, you'd need to increment a counter for every click and periodically send the value of that counter to the TSD. You can later query the TSD and ask for the rate of change of the counter, which will give you clicks per second.

Can I store sub-second precision timestamps in OpenTSDB?

As of version 2.0 you can store data with millisecond timestamps. However we recommend you avoid storing a data point at each millisecond as this will slow down queries dramatically. See Dates and Times for details.

Can I use another storage backend than HBase?

Not at this time. OpenTSDB was designed specifically for a storage backend that follows the Bigtable data model (a distributed, strongly consistent, sorted multi-dimensional hash map). At the time OpenTSDB was written, HBase is the only such system that's both open-source and usable in production, so the code was written specifically for HBase. Technically it would be feasible to port the code to other systems that follow the Bigtable data model. Systems that differ by not storing data in a sorted fashion (such as distributed hash tables) or that do not offer a strong consistency guarantee will simply not work with the current design.

We're looking at adding support for Cassandra now that it implements counters.

Misc

How do the TSDs handle DST changes or leap seconds?

The TSD doesn't assign timestamps to your data points, your collectors do. It is strongly recommended that you use UNIX timestamps in your collectors, so all your timestamps will be based on Epoch. This way you will not be affected by timezone adjustments or DST changes on your machines.

The TSD always renders timestamps in local time when using the GUI, to make it easier for us human to understand and correlate events based on the timezone we live in. So you should to make sure you give the TSD the correct timezone setting (e.g. via the TZ environment variable). When the TSD starts, it computes its offset from UTC and will then keep that offset forever. In case of a DST change, for instance, it would then appear that the TSD is 1 hour behind. There are plans to periodically re-compute the offset from UTC to avoid that situation, but right now you have to restart the TSD in order to adjust the offset. Note that this doesn't prevent the TSD from working properly, it only affects anything that parses dates from local time or renders them in local time. Dashboards and alerting systems should use relative time (e.g. "1d ago") and should thus be unaffected.

When leap seconds occur, UNIX timestamps go back by one second. The TSD should handle this situation gracefully (although this hasn't been tested yet). Unless you're collecting data every second, you won't notice anything except that the interval between the two data points where the leap second occurred is one second less than it should have been. If you do collect data every second, the second data point that attempts to overwrite the previous one during the leap second will be discarded with an error message.

The graphs are ugly, can they be made prettier?

Ugliness is a subjective thing :)

There are a lot of knobs that aren't exposed yet that would allow the TSD to generate nicer, antialiased, smoothed graphs. It's just a matter of exposing those Gnuplot knobs. Also, recent versions of Gnuplot can generate graphs in HTML5 canvas. We plan to use this to build pretty graphs you can interact with from your web browser.

Please contribute to help make the UI sexier.

Can I use OpenTSDB to generate graphs for my customers / end-users?

Yes, but you have to be careful with that. OpenTSDB was written for internal use only, to help engineers and operations staff understand and manage large computer systems. It hasn't been through any security review and does not included authentication.

We don't recommend that you give direct access to the TSD to untrusted users. If you really want to leverage the TSD's graphing features, we recommend that you put the TSD behind a secured HTTP proxy that only allows specific requests to go through. Alternatively, you could use the TSD to periodically pre-generate a fixed set of graphs and serve them as static images to your customers.

Why does OpenTSDB return more data than I asked for in my query?

All queries specify a start time and an end time (if the end time isn't specified, it is assumed to be "now"). OpenTSDB's goal is to plot a sensible graph covering that time span. However it needs to retrieve data before and after the times you actually specified, in order to know how to properly compute the values near the "edges" of the graph. Because having extra values past the times actually requested is required to draw accurate graphs, OpenTSDB also returns the extra data based on the assumption that if you want to plot your own graphs or make your own processing, you will also need the extra data to get the correct behavior near the edges.

The amount of extra data that OpenTSDB attempts to retrieve is proportional to the time span covered by your query. The 2.0 HTTP API will only return data within the requested time span.

I don't understand the data points returned for my query

Sometimes the results to a query don't match people's expectations. This is often because it's not necesssarily quite obvious what steps are involved in a query, why OpenTSDB uses interpolation, when do aggregators kick in. Please see the documentation for Aggregators for details.