OpenTSDB

tcollector

tcollector is a client-side process that gathers data from local collectors and pushes the data to OpenTSDB. You run it on all your hosts, and it does the work of sending each host's data to the TSD.

OpenTSDB is designed to make it easy to collect and write data to it. It has a simple protocol, simple enough for even a shell script to start sending data. However, to do so reliably and consistently is a bit harder. What do you do when your TSD server is down? How do you make sure your collectors stay running? This is where tcollector comes in.

Tcollector does several things for you:

Deduplication

Typically you want to gather data about everything in your system. This generates a lot of datapoints, the majority of which don't change very often over time (if ever). However, you want fine-grained resolution when they do change. Tcollector remembers the last value and timestamp that was sent for all of the time series for all of the collectors it manages. If the value doesn't change between sample intervals, it suppresses sending that datapoint. Once the value does change (or 10 minutes have passed), it sends the last suppressed value and timestamp, plus the current value and timestamp. In this way all of your graphs and such are correct. Deduplication typically reduces the number of datapoints TSD needs to collect by a large fraction. This reduces network load and storage in the backend. A future OpenTSDB release however will improve on the storage format by using RLE (among other things), making it essentially free to store repeated values.

Collecting lots of metrics with tcollector

Collectors in tcollector can be written in any language. They just need to be executable and output the data to stdout. Tcollector will handle the rest. The collectors are placed in the collectors directory. Tcollector iterates over every directory named with a number in that directory and runs all the collectors in each directory. If you name the directory 60, then tcollector will try to run every collector in that directory every 60 seconds. Use the directory 0 for any collectors that are long-lived and run continuously. Tcollector will read their output and respawn them if they die. Generally you want to write long-lived collectors since that has less overhead. OpenTSDB is designed to have lots of datapoints for each metric (for most metrics we send datapoints every 15 seconds).

If there any non-numeric named directories in the collectors directory, then they are ignored. We've included a lib and etc directory for library and config data used by all collectors.

Collectors bundled with tcollector

The following are the collectors we've included as part of the base package, together with all of the metric names they report on and what they mean. If you have any others you'd like to contribute, we'd love to hear about them so we can reference them or include them with your permission in a future release.

0/dfstat.py

These stats from running /usr/bin/df -PlTk These metrics include time series tagged with each mount point and the filesystem's fstype. Since TSD doesn't allow slashes in tag values, we substitute / for _. This collector also filters out any debugfs, devtmpfs filesystems, as well as any any mountpoints mounted under /dev/ or /lib/.

With these tags you can select to graph just a specific filesystem, or all filesystems with a particular fstype (e.g. ext3).

0/ifstat.py

These stats are from /proc/net/dev These are interface counters, tagged with the interface, iface=, and direction= in or out. Only ethN interfaces are tracked. We intentionally exclude bondN interfaces, because bonded interfaces still keep counters on their child ethN interfaces and we don't want to double-count a box's network traffic if you don't select on iface=.

0/iostat.py

Data is from /proc/diskstats.

See iostats.txt

/proc/diskstats has 11 stats for a given physical device. These are all rate counters, except ios_in_progress.

       .read_requests       Number of reads completed
       .read_merged         Number of reads merged
       .read_sectors        Number of sectors read
       .msec_read           Time in msec spent reading
       .write_requests      Number of writes completed
       .write_merged        Number of writes merged
       .write_sectors       Number of sectors written
       .msec_write          Time in msec spent writing
       .ios_in_progress     Number of I/O operations in progress
       .msec_total          Time in msec doing I/O
       .msec_weighted_total Weighted time doing I/O (multiplied by ios_in_progress)

in 2.6.25 and later, by-partition stats are reported the same as disks.

NOTE: in 2.6 before 2.6.25, partitions have only 4 stats per partition

       .read_issued
       .read_sectors
       .write_issued
       .write_sectors

For partitions, these *_issued are counters collected before requests are merged, so aren't the same as *_requests (which is post-merge, which more closely represents represents the actual number of disk transactions).

Given that diskstats provides both per-disk and per-partition data, for TSDB purposes we put them under different metrics (versus the same metric and different tags). Otherwise, if you look at a given metric, the data for a given box will be double-counted, since a given operation will increment both the disk series and the partition series. To fix this, we output by-disk data to iostat.disk.* and by-partition data to iostat.part.*.

0/netstat.py

Socket allocation and network statistics.

Metrics from /proc/net/sockstat:

Metrics from /proc/net/netstat (netstat -s command):

0/procnettcp.py

These stats are all from /proc/net/tcp{,6}. (Note if IPv6 is enabled, some IPv4 connections seem to get put into /proc/net/tcp6). Collector sleeps 60 seconds in between intervals. Due in part to a kernel performance issue in older kernels and in part due to systems with many TCP connections, this collector can take sometimes 5 minutes or more to run one interval, so the frequency of datapoints can be highly variable depending on the system.

For each run of the collector, we classify each connection and generate subtotals. TSD will automatically total these up when displaying the graph, but you can drill down for each possible total or a particular one. Each connection is broken down with a tag for user=username (with a fixed list of users we care about or put under "other" if not in the list). It is also broken down into state with state=, (established, time_wait, etc). It is also broken down into services with service= (http, mysql, memcache, etc) Note that once a connection is closed, Linux seems to forget who opened/handled the connection. For connections in time_wait, for example, they will always show user=root. This collector does generate a large amount of datapoints, as the number of points is (S*(U+1)*V), where S=number of TCP states, U=Number of users you track, and V=number of services (collections of ports). The deduper does dedup this down very well, as only 3 of the 10 TCP states are generally ever seen. On a typical server this can dedup down to under 10 values per interval.

0/procstats.py

Miscellaneous stats from /proc

See http://www.linuxhowtos.org/System/procstat.htm