Setup HBase

In order to use OpenTSDB, you need to have HBase up and running. This page will help you get started with a simple, single-node HBase setup, which is good enough to evaluate OpenTSDB or monitor small installations. If you need scalability and reliability, you will need to setup a full HBase cluster.

You can copy-paste all the following instructions directly into a terminal.

Setup a single-node HBase instance

If you already have an HBase cluster, skip this step. If you're gonna be using less than 5-10 nodes, stick to a single node. Deploying HBase on a single node is easy and can help get you started with OpenTSDB quickly. You can always scale to a real cluster and migrate your data later.

wget http://www.apache.org/dist/hbase/hbase-0.98.10.1/hbase-0.98.10.1-hadoop1-bin.tar.gz tar xfz hbase-0.98.10.1-hadoop1-bin.tar.gz cd hbase-0.98.10.1-hadoop1

At this point, you are ready to start HBase (without HDFS) on a single node. But before starting it, I recommend using the following configuration:

hbase_rootdir=${TMPDIR-'/tmp'}/tsdhbase iface=lo`uname | sed -n s/Darwin/0/p` cat >conf/hbase-site.xml <<EOF <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hbase.rootdir</name> <value>file:///$hbase_rootdir/hbase-\${user.name}/hbase</value> </property> <property> <name>hbase.zookeeper.dns.interface</name> <value>$iface</value> </property> <property> <name>hbase.regionserver.dns.interface</name> <value>$iface</value> </property> <property> <name>hbase.master.dns.interface</name> <value>$iface</value> </property> </configuration> EOF

Make sure to adjust the value of hbase_rootdir if you want HBase to store its data in somewhere more durable than a temporary directory. The default is to use /tmp, which means you'll lose all your data whenever your server reboots. The remaining settings are less important and simply force HBase to stick to the loopback interface (lo0 on Mac OS X, or just lo on Linux), which simplifies things when you're just testing HBase on a single node.

Now start HBase:

./bin/start-hbase.sh

Using LZO

There is no reason to not use LZO with HBase. Except in rare cases, the CPU cycles spent on doing LZO compression / decompression pay for themselves by saving you time wasted doing more I/O. This is certainly true for OpenTSDB where LZO can easily compress OpenTSDB's binary data by 3 to 4x. Installing LZO is simple and is done as follows.

Pre-requisites

In order to build hadoop-lzo, you need to have Ant installed as well as liblzo2 with development headers:

apt-get install ant liblzo2-dev # Debian/Ubuntu yum install ant ant-nodeps lzo-devel.x86_64 # RedHat/CentOS/Fedora brew install lzo # Mac OS X

Compile & Deploy

Thanks to our friends at Cloudera for maintaining the Hadoop-LZO package:

git clone git://github.com/cloudera/hadoop-lzo.git cd hadoop-lzo CLASSPATH=path/to/hadoop-core-1.0.4.jar CFLAGS=-m64 CXXFLAGS=-m64 ant compile-native tar hbasedir=path/to/hbase mkdir -p $hbasedir/lib/native cp build/hadoop-lzo-0.4.14/hadoop-lzo-0.4.14.jar $hbasedir/lib cp -a build/hadoop-lzo-0.4.14/lib/native/* $hbasedir/lib/native

Restart HBase and make sure you create your tables with COMPRESSION => 'LZO'

Common gotchas:

Where to find hadoop-core-1.0.4.jar? On a normal, production HBase install, it will be under HBase's lib/ directory. In your development environment it may be stashed under HBase's target/ directory, use find to locate it.
On Mac OS X, you may get error: Native java headers not found. Is $JAVA_HOME set correctly? when configure is looking for jni.h, in which case you need to insert CPPFLAGS=-I/System/Library/Frameworks/JavaVM.framework/Versions/Current/Headers before CLASSPATH on the 3rd command above (the one that invokes ant).
On RedHat/CentOS/Fedora you may have to specify where Java is, by adding JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk.x86_64 (or similar) to the ant command-line, before the CLASSPATH.
On RedHat/CentOS/Fedora, if you get the weird error message that "Cause: the class org.apache.tools.ant.taskdefs.optional.Javah was not found." then you need to install the ant-nodeps package.
The build may fail with [javah] Error: Class org.apache.hadoop.conf.Configuration could not be found. in which case you need to apply this change to build.xml
On Ubuntu, the build may fail to compile the code with LzoCompressor.c:125:37: error: expected expression before ',' token. As per HADOOP-2009 the solution is to add LDFLAGS='-Wl,--no-as-needed' to the command-line.

Migrating to a real HBase cluster

TBD. In short:

Shut down all your TSDs.
Shut down your single-node HBase cluster.
Copy the directories named tsdb and tsdb-uid from your local filesystem to the HDFS cluster backing up your real HBase cluster.
Run ./bin/hbase org.jruby.Main ./bin/add_table.rb /hdfs/path/to/hbase/tsdb and again for the tsdb-uid directory.
Restart your real HBase cluster (sorry).
Restart your TSDs after making sure they now use your real HBase cluster.

Putting HBase in production

TBD. In short:

Stay on a single node unless you can deploy HBase on at least 5 machines, preferably at least 10.
Make sure you have LZO installed and make sure it's enabled for the tables used by OpenTSDB.
TBD...