Inverse DNS resolution with HBase and Phoenix

Recently I did some quick experiments with HBase (1.1.2) and Phoenix (4.6.0). As a quite big dataset I used the DNS data available as a dnscensus 2013 archive. This dataset contains DNS data collected by extraction of port 53 data out of network captures at some central internet routers. Here some examples:

To get the inverse map of all collected type A DNS requests the phoenix table should have a key starting with the ip address! Don’t confuse reverse dns lookup (PTR) with the inverse of a type A DNS request!

The table create script for phoenix looks like

To get the original data in the right order I used sed

A bit tricky was the line endings of the original file data, why I used .* to rid of all type of white spaces after the ip addresses. The tail command just removes the header line, which is not needed when the data get imported into phoenix.

How many rows do we have?

Loading the csv data into phoenix is quite simple with the bulk load utility:


The result is a list of dns names which resolve to the ip of google’s public dns server!

The select from a table with almost 1 billion of rows took less than a 10th of a second! Nice 🙂