Inverse DNS resolution with HBase and Phoenix

Recently I did some quick experiments with HBase (1.1.2) and Phoenix (4.6.0). As a quite big dataset I used the DNS data available as a dnscensus 2013 archive. This dataset contains DNS data collected by extraction of port 53 data out of network captures at some central internet routers. Here some examples:

To get the inverse map of all collected type A DNS requests the phoenix table should have a key starting with the ip address! Don’t confuse reverse dns lookup (PTR) with the inverse of a type A DNS request!

The table create script for phoenix looks like

To get the original data in the right order I used sed

A bit tricky was the line endings of the original file data, why I used .* to rid of all type of white spaces after the ip addresses. The tail command just removes the header line, which is not needed when the data get imported into phoenix.

How many rows do we have?

Loading the csv data into phoenix is quite simple with the bulk load utility:


The result is a list of dns names which resolve to the ip of google’s public dns server!

The select from a table with almost 1 billion of rows took less than a 10th of a second! Nice 🙂

chukwa usage with hadoop 2 (2.4.0) and hbase 0.98

If you want to use chukwa with hadoop 2.4.0 and hbase 0.98 you need to exchange some jars, because the trunk version uses older versions of hadoop and hbase (at the time of writing).

To be sure that I have the newest version I cloned the sources from the github repository

using the git command

In the local chukwa directory the build is easy done using maven:

After the successful maven build the fresh tarball of chukwa should be available in the target subdirectory.

Now we can untar the tarball in a place of our choice and configure chukwa like it is described in the tutorial. The hbase schema file has some small typos which I corrected in a fork.

Before starting chukwa we have to exchange some jars. First turn off some jars that are for older hadoop / hbase versions.

Copy the following jars from the hadoop and/or hbase distribution to the same chukwa directory.

Now I started a chukwa collector on my namenode and a agent on a linux box. After some minutes the first log data can be seen in HBase:

If writing to hdfs you will see something like: