Yarn / Tez / Google protocol buffer error

Some people seem to get errors when executing hive on tez. I saw the same stack trace in several posts:

Searching for details about this error in the hadoop 2.3.0 source code I couldn’t find the relevant classes in the stack trace. The trick is, that the class

org.apache.hadoop.yarn.proto.YarnProtos$ApplicationIdProto

is auto generated during the hadoop build.

mvn install -DskipTests=true -Dmaven.javadoc.skip=true

After the build is finished a search for this class is successful:

The lines that making the trouble are

Searching for this method in internet I found the post in http://goo.gl/s4jmje

Using classfinder we can search all projects on our server for this class and find some relevant matches in several hive projects.

In the snapshot 0.14 build the explicite depdency to the google protocol buffer jar with version 2.4.1 is removed. After switching hive to this snapshot version the error in combination with tez disappears.

Hadoop / YARN / Tez

0. Introductary Remarks

Here my extended version of the installation / deploy documentation of Tez. The original version can seen on the Tez webpage

I have used hadoop 2.3.0 to try the Tez installation. So I assume here that your hadoop cluster ist already up and running. Look on the webpage of hadoop for installation instructions if you have not already a running hadoop cluster.

My version of hadoop is running on debian Wheezy.

1. Download the Tez tarball

You can get the source tarball of tez from it’s apache incubator page.

2. Compile the sources using maven

Before starting maven to build tez we have to change the pom.xml file in the root directory of the unzipped tez directory. In my case I changed the hadoop version property hadoop.version from 2.2.0 to 2.3.0.

After starting maven with the following command, usually some missing jars are downloaded from the central repository.

3. Adapting configuration files

3.1 Add TEZ variables to .bashrc

As hadoop user edit ~/.bashrc and append the following lines

3.2 Add Tez jars to the hadoop environment shell script

Edit ${HADOOP_INSTALL}/etc/hadoop/hadoop-env.sh

Right after the lines

add the following new lines to add tez jars

 3.3 Upload tez jars to hdfs

 3.4. Create a tez configuration file

Create/Edit the file tez-site.xml in ${HADOOP_INSTALL}/etc/hadoop/

3.5 Configure MapReduce to use yarn-tez instead of yarn

Create or edit mapred-site.xml in ${HADOOP_INSTALL}/etc/hadoop/

3.6 Start or restart yarn

${HADOOP_INSTALL}/sbin/stop-yarn.sh

${HADOOP_INSTALL}/sbin/start-yarn.sh

4. Start a tez example

In the tez tarball are several examples included. Let’s start the orderedwordcount example.

Create a hdfs dir /tests/tez-examples/in and copy some text files of your choice to it.

Then execute the command:

The output should look similar to mine here:

Author: