Hadoop / YARN / Tez

0. Introductary Remarks

Here my extended version of the installation / deploy documentation of Tez. The original version can seen on the Tez webpage

I have used hadoop 2.3.0 to try the Tez installation. So I assume here that your hadoop cluster ist already up and running. Look on the webpage of hadoop for installation instructions if you have not already a running hadoop cluster.

My version of hadoop is running on debian Wheezy.

1. Download the Tez tarball

You can get the source tarball of tez from it’s apache incubator page.

2. Compile the sources using maven

Before starting maven to build tez we have to change the pom.xml file in the root directory of the unzipped tez directory. In my case I changed the hadoop version property hadoop.version from 2.2.0 to 2.3.0.

After starting maven with the following command, usually some missing jars are downloaded from the central repository.

3. Adapting configuration files

3.1 Add TEZ variables to .bashrc

As hadoop user edit ~/.bashrc and append the following lines

3.2 Add Tez jars to the hadoop environment shell script

Edit ${HADOOP_INSTALL}/etc/hadoop/hadoop-env.sh

Right after the lines

add the following new lines to add tez jars

 3.3 Upload tez jars to hdfs

 3.4. Create a tez configuration file

Create/Edit the file tez-site.xml in ${HADOOP_INSTALL}/etc/hadoop/

3.5 Configure MapReduce to use yarn-tez instead of yarn

Create or edit mapred-site.xml in ${HADOOP_INSTALL}/etc/hadoop/

3.6 Start or restart yarn

${HADOOP_INSTALL}/sbin/stop-yarn.sh

${HADOOP_INSTALL}/sbin/start-yarn.sh

4. Start a tez example

In the tez tarball are several examples included. Let’s start the orderedwordcount example.

Create a hdfs dir /tests/tez-examples/in and copy some text files of your choice to it.

Then execute the command:

The output should look similar to mine here:

Author:

7 thoughts on “Hadoop / YARN / Tez

  1. HI,

    I try to run the above given project by changing hadoop version from 2.0.2 to 2.0.4 by using the bellow command

    mvn clean install -DskipTests=true -Dmaven.javadoc.skip=true

    then i am getting the following exception, can you please provide the solution for this.

    SLF4J: Failed to load class “org.slf4j.impl.StaticLoggerBinder”.
    SLF4J: Defaulting to no-operation (NOP) logger implementation
    SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
    [INFO] Scanning for projects…
    [WARNING]
    [WARNING] Some problems were encountered while building the effective model for org.apache.tez:tez-api:jar:0.4.0-incubating
    [WARNING] ‘build.pluginManagement.plugins.plugin.(groupId:artifactId)’ must be unique but found duplicate declaration of plugin org.codehaus.mojo:findbugs-maven-plugin @ org.apache.tez:tez:0.4.0-incubating, /home/techgene/tez-0.4.0-incubating/pom.xml, line 494, column 13
    [WARNING] ‘build.plugins.plugin.version’ for org.apache.hadoop:hadoop-maven-plugins is missing. @ line 74, column 12
    [WARNING]
    [WARNING] Some problems were encountered while building the effective model for org.apache.tez:tez-runtime-library:jar:0.4.0-incubating
    [WARNING] ‘build.plugins.plugin.version’ for org.apache.hadoop:hadoop-maven-plugins is missing. @ line 68, column 15
    [WARNING]
    [WARNING] Some problems were encountered while building the effective model for org.apache.tez:tez-runtime-internals:jar:0.4.0-incubating
    [WARNING] ‘build.plugins.plugin.version’ for org.apache.hadoop:hadoop-maven-plugins is missing. @ line 73, column 15
    [WARNING]
    [WARNING] Some problems were encountered while building the effective model for org.apache.tez:tez-mapreduce:jar:0.4.0-incubating
    [WARNING] ‘build.plugins.plugin.version’ for org.apache.hadoop:hadoop-maven-plugins is missing. @ line 106, column 15
    [WARNING]
    [WARNING] Some problems were encountered while building the effective model for org.apache.tez:tez-dag:jar:0.4.0-incubating
    [WARNING] ‘build.plugins.plugin.version’ for org.apache.hadoop:hadoop-maven-plugins is missing. @ line 121, column 15
    [WARNING]
    [WARNING] Some problems were encountered while building the effective model for org.apache.tez:tez:pom:0.4.0-incubating
    [WARNING] ‘build.pluginManagement.plugins.plugin.(groupId:artifactId)’ must be unique but found duplicate declaration of plugin org.codehaus.mojo:findbugs-maven-plugin @ line 494, column 13
    [WARNING]
    [WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
    [WARNING]
    [WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
    [WARNING]
    [INFO] ————————————————————————
    [INFO] Reactor Build Order:
    [INFO]
    [INFO] tez
    [INFO] tez-api
    [INFO] tez-common
    [INFO] tez-runtime-internals
    [INFO] tez-runtime-library
    [INFO] tez-mapreduce
    [INFO] tez-mapreduce-examples
    [INFO] tez-tests
    [INFO] tez-dag
    [INFO] tez-dist
    [INFO] Tez
    [INFO]
    [INFO] ————————————————————————
    [INFO] Building tez 0.4.0-incubating
    [INFO] ————————————————————————
    [INFO]
    [INFO] — buildnumber-maven-plugin:1.1:create (default) @ tez —
    [INFO] Checking for local modifications: skipped.
    [INFO] Updating project files from SCM: skipped.
    [INFO] Executing: /bin/sh -c cd /home/techgene/tez-0.4.0-incubating && git rev-parse –verify HEAD
    [INFO] Working directory: /home/techgene/tez-0.4.0-incubating
    [INFO] Storing buildNumber: null at timestamp: 1399295235782
    [INFO] Executing: /bin/sh -c cd /home/techgene/tez-0.4.0-incubating && git rev-parse –verify HEAD
    [INFO] Working directory: /home/techgene/tez-0.4.0-incubating
    [INFO] Storing buildScmBranch: UNKNOWN_BRANCH
    [INFO]
    [INFO] — build-helper-maven-plugin:1.8:maven-version (maven-version) @ tez —
    [INFO]
    [INFO] — maven-jar-plugin:2.3.1:test-jar (default) @ tez —
    [WARNING] JAR will be empty – no content was marked for inclusion!
    [INFO]
    [INFO] — maven-install-plugin:2.3.1:install (default-install) @ tez —
    [INFO] Installing /home/techgene/tez-0.4.0-incubating/pom.xml to /home/techgene/.m2/repository/org/apache/tez/tez/0.4.0-incubating/tez-0.4.0-incubating.pom
    [INFO] Installing /home/techgene/tez-0.4.0-incubating/target/tez-0.4.0-incubating-tests.jar to /home/techgene/.m2/repository/org/apache/tez/tez/0.4.0-incubating/tez-0.4.0-incubating-tests.jar
    [INFO]
    [INFO] ————————————————————————
    [INFO] Building tez-api 0.4.0-incubating
    [INFO] ————————————————————————
    [INFO]
    [INFO] — buildnumber-maven-plugin:1.1:create (default) @ tez-api —
    [INFO] Checking for local modifications: skipped.
    [INFO] Updating project files from SCM: skipped.
    [INFO] Executing: /bin/sh -c cd /home/techgene/tez-0.4.0-incubating/tez-api && git rev-parse –verify HEAD
    [INFO] Working directory: /home/techgene/tez-0.4.0-incubating/tez-api
    [INFO] Storing buildNumber: null at timestamp: 1399295237430
    [INFO] Executing: /bin/sh -c cd /home/techgene/tez-0.4.0-incubating/tez-api && git rev-parse –verify HEAD
    [INFO] Working directory: /home/techgene/tez-0.4.0-incubating/tez-api
    [INFO] Storing buildScmBranch: UNKNOWN_BRANCH
    [INFO]
    [INFO] — build-helper-maven-plugin:1.8:maven-version (maven-version) @ tez-api —
    [INFO]
    [INFO] — hadoop-maven-plugins:2.4.0:protoc (compile-protoc) @ tez-api —
    [WARNING] [protoc, –version] failed: java.io.IOException: Cannot run program “protoc”: error=2, No such file or directory
    [ERROR] stdout: []
    [INFO] ————————————————————————
    [INFO] Reactor Summary:
    [INFO]
    [INFO] tez ……………………………………….. SUCCESS [1.880s]
    [INFO] tez-api ……………………………………. FAILURE [1.380s]
    [INFO] tez-common …………………………………. SKIPPED
    [INFO] tez-runtime-internals ……………………….. SKIPPED
    [INFO] tez-runtime-library …………………………. SKIPPED
    [INFO] tez-mapreduce ………………………………. SKIPPED
    [INFO] tez-mapreduce-examples ………………………. SKIPPED
    [INFO] tez-tests ………………………………….. SKIPPED
    [INFO] tez-dag ……………………………………. SKIPPED
    [INFO] tez-dist …………………………………… SKIPPED
    [INFO] Tez ……………………………………….. SKIPPED
    [INFO] ————————————————————————
    [INFO] BUILD FAILURE
    [INFO] ————————————————————————
    [INFO] Total time: 4.062s
    [INFO] Finished at: Mon May 05 06:07:17 PDT 2014
    [INFO] Final Memory: 13M/176M
    [INFO] ————————————————————————
    [ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:2.4.0:protoc (compile-protoc) on project tez-api: org.apache.maven.plugin.MojoExecutionException: ‘protoc –version’ did not return a version -> [Help 1]
    [ERROR]
    [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
    [ERROR] Re-run Maven using the -X switch to enable full debug logging.
    [ERROR]
    [ERROR] For more information about the errors and possible solutions, please read the following articles:
    [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
    [ERROR]
    [ERROR] After correcting the problems, you can resume the build with the command
    [ERROR] mvn -rf :tez-api

  2. Thank for sharing this great Hadoop tutorials Blog post. I will use your command when upgrade hadoop.I get a lot of great information here and this is what I am searching for. Thank you for your sharing. I have bookmark this page for my future reference.

  3. @siva… i think u have to install protocol buffer and g++ compiler too…
    for the first time am also facing these type type of problem but after installing protocol buffer and g++ compiler its works fine.

Leave a Reply

Your email address will not be published. Required fields are marked *

This blog is kept spam free by WP-SpamFree.