Setup HBase
In order to use OpenTSDB, you need to have
HBase up and running.
This page will help you get started with a simple, single-node HBase
setup, which is good enough to evaluate OpenTSDB or monitor small
installations. If you need scalability and reliability, you will
need to setup a full HBase cluster.
You can copy-paste all the following instructions directly into a terminal.
Setup a single-node HBase instance
If you already have an HBase cluster,
skip this step.
If you're gonna be using less than 5-10 nodes, stick to a single node.
Deploying HBase on a single node is easy and can help get you started
with OpenTSDB quickly. You can always scale to a real cluster and migrate
your data later.
wget http://www.apache.org/dist/hbase/hbase-0.98.10.1/hbase-0.98.10.1-hadoop1-bin.tar.gz
tar xfz hbase-0.98.10.1-hadoop1-bin.tar.gz
cd hbase-0.98.10.1-hadoop1
At this point, you are ready to start HBase (without HDFS) on a single
node. But before starting it, I recommend using the following configuration:
hbase_rootdir=${TMPDIR-'/tmp'}/tsdhbase
iface=lo`uname | sed -n s/Darwin/0/p`
cat >conf/hbase-site.xml <<EOF
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:///$hbase_rootdir/hbase-\${user.name}/hbase</value>
</property>
<property>
<name>hbase.zookeeper.dns.interface</name>
<value>$iface</value>
</property>
<property>
<name>hbase.regionserver.dns.interface</name>
<value>$iface</value>
</property>
<property>
<name>hbase.master.dns.interface</name>
<value>$iface</value>
</property>
</configuration>
EOF
Make sure to adjust the value of hbase_rootdir
if you want HBase
to store its data in somewhere more durable than a temporary directory. The
default is to use /tmp
, which means you'll lose all your data
whenever your server reboots. The remaining settings are less important
and simply force HBase to stick to the loopback interface (lo0
on Mac OS X, or just lo
on Linux), which simplifies things when
you're just testing HBase on a single node.
Now start HBase:
./bin/start-hbase.sh
Using LZO
There is no reason to not use LZO with HBase. Except in rare cases, the CPU
cycles spent on doing LZO compression / decompression pay for themselves by
saving you time wasted doing more I/O. This is certainly true for OpenTSDB
where LZO can easily compress OpenTSDB's binary data by 3 to 4x. Installing
LZO is simple and is done as follows.
Pre-requisites
In order to build hadoop-lzo
, you need to have Ant installed as
well as liblzo2 with development headers:
apt-get install ant liblzo2-dev # Debian/Ubuntu
yum install ant ant-nodeps lzo-devel.x86_64 # RedHat/CentOS/Fedora
brew install lzo # Mac OS X
Compile & Deploy
Thanks to our friends at Cloudera for maintaining the Hadoop-LZO package:
git clone git://github.com/cloudera/hadoop-lzo.git
cd hadoop-lzo
CLASSPATH=path/to/hadoop-core-1.0.4.jar CFLAGS=-m64 CXXFLAGS=-m64 ant compile-native tar
hbasedir=path/to/hbase
mkdir -p $hbasedir/lib/native
cp build/hadoop-lzo-0.4.14/hadoop-lzo-0.4.14.jar $hbasedir/lib
cp -a build/hadoop-lzo-0.4.14/lib/native/* $hbasedir/lib/native
Restart HBase and make sure you create your tables with
COMPRESSION => 'LZO'
Common gotchas:
- Where to find
hadoop-core-1.0.4.jar
? On a normal, production
HBase install, it will be under HBase's lib/
directory. In your
development environment it may be stashed under HBase's target/
directory, use find
to locate it.
- On Mac OS X, you may get
error: Native java headers not found. Is
$JAVA_HOME set correctly?
when configure
is looking for
jni.h
, in which case you need to insert
CPPFLAGS=-I/System/Library/Frameworks/JavaVM.framework/Versions/Current/Headers
before CLASSPATH
on the 3rd command above (the one that invokes
ant
).
- On RedHat/CentOS/Fedora you may have to specify where Java is, by adding
JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk.x86_64
(or similar)
to the ant
command-line, before the CLASSPATH
.
- On RedHat/CentOS/Fedora, if you get the weird error message that "Cause:
the class org.apache.tools.ant.taskdefs.optional.Javah was not found." then
you need to install the
ant-nodeps
package.
- The build may fail with
[javah] Error: Class
org.apache.hadoop.conf.Configuration could not be found.
in which case
you need to apply
this change
to build.xml
- On Ubuntu, the build may fail to compile the code with
LzoCompressor.c:125:37: error: expected expression before ','
token
. As per
HADOOP-2009
the solution is to add LDFLAGS='-Wl,--no-as-needed'
to the
command-line.
Migrating to a real HBase cluster
TBD. In short:
- Shut down all your TSDs.
- Shut down your single-node HBase cluster.
- Copy the directories named
tsdb
and tsdb-uid
from your local filesystem to the HDFS cluster backing up your real HBase
cluster.
- Run
./bin/hbase org.jruby.Main ./bin/add_table.rb
/hdfs/path/to/hbase/tsdb
and again for the
tsdb-uid
directory.
- Restart your real HBase cluster (sorry).
- Restart your TSDs after making sure they now use your real HBase
cluster.
Putting HBase in production
TBD. In short:
- Stay on a single node unless you can deploy HBase on at least 5 machines,
preferably at least 10.
- Make sure you have LZO installed
and make sure it's enabled for the tables used by OpenTSDB.
- TBD...