Hadoop / YARN / Tez

0. Introductary Remarks

Here my extended version of the installation / deploy documentation of Tez. The original version can seen on the Tez webpage

I have used hadoop 2.3.0 to try the Tez installation. So I assume here that your hadoop cluster ist already up and running. Look on the webpage of hadoop for installation instructions if you have not already a running hadoop cluster.

My version of hadoop is running on debian Wheezy.

1. Download the Tez tarball

You can get the source tarball of tez from it’s apache incubator page.

2. Compile the sources using maven

Before starting maven to build tez we have to change the pom.xml file in the root directory of the unzipped tez directory. In my case I changed the hadoop version property hadoop.version from 2.2.0 to 2.3.0.

After starting maven with the following command, usually some missing jars are downloaded from the central repository.

3. Adapting configuration files

3.1 Add TEZ variables to .bashrc

As hadoop user edit ~/.bashrc and append the following lines

3.2 Add Tez jars to the hadoop environment shell script

Edit ${HADOOP_INSTALL}/etc/hadoop/hadoop-env.sh

Right after the lines

add the following new lines to add tez jars

 3.3 Upload tez jars to hdfs

 3.4. Create a tez configuration file

Create/Edit the file tez-site.xml in ${HADOOP_INSTALL}/etc/hadoop/

3.5 Configure MapReduce to use yarn-tez instead of yarn

Create or edit mapred-site.xml in ${HADOOP_INSTALL}/etc/hadoop/

3.6 Start or restart yarn

${HADOOP_INSTALL}/sbin/stop-yarn.sh

${HADOOP_INSTALL}/sbin/start-yarn.sh

4. Start a tez example

In the tez tarball are several examples included. Let’s start the orderedwordcount example.

Create a hdfs dir /tests/tez-examples/in and copy some text files of your choice to it.

Then execute the command:

The output should look similar to mine here:

Author: