Compile Hadoop Binaries

It doesn’t come as a surprise that hadoop is not supported on the raspberry pi and doesn’t have native libraries available. But not to worry it turns out to be pretty straight forward to compile from source code. For this post I have based it on version 2.6.5 but the process is very similar and you can find a number of versions I have compiled on my downloads page. The version I am currently running on my cluster is 2.6.4 and this was compiled ready for the upgrade.

More details of the releases can be found here http://hadoop.apache.org/releases.html

1. Download Source code
2. Validate and unpack
3. Install prerequisites
     a. Maven 3.0 or greater
     b. JDK 1.6 or greater
     c. ProtocolBuffer 2.5.0
     d. Cmake 2.6 or newer
     e. Zlib devel
     f. openssl devel
     g. Additional items
4. Apply HADOOP-9320 patch
5. Update pom.xml for javadoc issue on compile
6. Compile
7. Job Done!

1. Download Source code

You can select one of the very many mirrors listed on the release page above.

wget http://apache.mirrors.lucidnetworks.net/hadoop/common/hadoop-2.6.5/hadoop-2.6.5-src.tar.gz

2. Validate and unpack

There are two ways to validate the download. A simple check using SHA-256 or a more complete check using GPG. The difference? Using a normal checksum you validate that the download is completed – it doesn’t tell you if the code has been altered in anyway. It could be possible for someone to alter the source code, calculate a new checksum and place them on a website for you to download – you would never know it’s been tampered with.
GPG uses a private key known only to the singer so both confirms the download and that the contents have not been tampered with. GPG is installed by default on most distributions including the Raspbian Lite that I use.

Download the signature file and KEYS file and verify.

wget https://dist.apache.org/repos/dist/release/hadoop/common/hadoop-2.6.5/hadoop-2.6.5-src.tar.gz.asc
wget https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
gpg --import KEYS
gpg --verify hadoop-2.6.5-src.tar.gz.asc

Don’t worry about the Warning message – this just means you havent signed the apache keys with your own key. I will do another post talking about how to sign files with GPG (namely the binaries generated from this post!). If you get a BAD signature. Stop – the file could have been tampered with. Best chose another mirror.

Now we just need to unpack the files

tar -zxvf hadoop-2.6.5-src.tar.gz

This will unpack the files into a hadoop-2.6.5-src directory

3. Install prerequisites

Its always a good idea to make sure that the repositories are up to date with the latest package lists.

sudo apt-get update

There is a file in the source directory that describes the build process called BUILDING.TXT
that lists all the prerequisites need to compile the software it lists.

———————————————————————————-
Requirements:
* Unix System
* JDK 1.6+
* Maven 3.0 or later
* Findbugs 1.3.9 (if running findbugs)
* ProtocolBuffer 2.5.0
* CMake 2.6 or newer (if compiling native code), must be 3.0 or newer on Mac
* Zlib devel (if compiling native code)
* openssl devel ( if compiling native hadoop-pipes )
* Internet connection for first build (to fetch all Maven and Hadoop dependencies)
———————————————————————————-

a. Maven

Maven is a tool that aims to make it easy for any Java based project and shields us from a lot of detail. At its core is the pom.xml file that contains the project and configuration files required to build. We will need to make a small adjustment to the pom.xml for the hadoop build later on. For more information check out the maven site https://maven.apache.org/

At the time of writing the latest version of maven compiled for the pi is 3.0.5-3 so should work fine.

sudo apt-get install maven

b. JDK 1.6 or greater

Whilst is says JDK1.6 everything else I have runs on JDK1.8 so I will use that instead. I found that I needed to install the jdk after I installed maven as maven also installs a java 1.7 runtime environment only not the full jdk and then changed all the settings to use 1.7 causing my compiles to fail. Annoying but installing java after maven seems to do the trick.

sudo apt-get install oracle-java8-jdk

c. ProtocolBuffer 2.5.0

Protobuf needs to be downloaded and compiled. Hadoop required a specific version that is getting quite old now. For more information on what protobuf is https://developers.google.com/protocol-buffers/?csw=1

wget https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz
tar xzvf protobuf-2.5.0.tar.gz
cd protobuf-2.5.0
sudo ./configure --prefix=/usr
sudo make
make check
make install

d. Cmake 2.6 or newer

Cmake is another build tool that I assume is required for some of the non java components.

https://cmake.org/

At time of writing 3.6.2-2 is the latest version compiled for the pi so that will do nicely again.

sudo apt-get install cmake

e. Zlib devel

Zlib is a free, platform independent compression library https://zlib.net/

sudo apt-get install zlib1g-dev

You will probably find (at least I did) this is already installed, possibility as a dependency for maven.

f. openssl devel

If you have managed to get this far into the wonders of linux and the raspberry pi without encountering cryptography then you have done well! It’s can seem a daunting subject at first with talk of public and private keys etc. Well this is the development library for all of that…

sudo apt-get install libssl-dev

g. Additional Items

When I first compiled it all worked fine however it was only after I checked the native libraries did I find that snappy and bzip2 where not present. These appear to need some additional prerequisites not listed out in the BUILDING.TXT file. Adding in the following did the trick.

sudo apt-get libsnappy-dev  libbz2-dev

4. Apply HADOOP-9320 patch
Now we need to apply a patch to some issues with the ARM version of the JVM more info can be found here

https://issues.apache.org/jira/browse/HADOOP-9320

cd hadoop-2.6.5-src/hadoop-common-project/hadoop-common/src
wget https://issues.apache.org/jira/secure/attachment/12570212/HADOOP-9320.patch
patch < HADOOP-9320.patch
cd ~/hadoop-2.6.5-src

5. Update pom.xml for javadoc issue on compile

The javadoc version in Java 8 is considerably more strict than the one in earlier version. It now signals an error if it detects what it considers to be invalid markup, including the presence of an end tag where one isn’t expected which causes the compile to fail.

To turn off this checking in javadoc we need to add a new property into pom.xml

<additionalparam>-Xdoclint:none</additionalparam>

Again more information can be found here: https://stackoverflow.com/questions/24615547/cant-build-hadoop-2-4-1-with-java8

6. Compile

mvn package -Pdist,native -DskipTests -Dtar

7. Job Done!

Within the src folder now there is a hadoop-dist folder and in that a target folder that contains everything we need

cd hadoop-dist/target
ls -la

The two main items are the hadoop-2.6.5 folder that contains a copy of the working installation and also a hadoop-2.6.5.tar.gz that can be used to deploy directly to other servers without needing to recompile each time. A number of these can be found on my download page.

Now for some simple checks (remember to make sure your java home is set)

export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:jre/bin/java::")
hadoop-2.6.5/bin/hadoop version
hadoop-2.6.5/bin/hadoop checknative -a

For those keen eyed and have noticed the different timestamps… on the first attempt the check for native libraries failed and I needed to recompile with additional prerequisites.

The out-takes!

Screen shot showing the javadoc issue that needed the pom.xml file changes

The first compile that I thought worked… Notice the snappy and bzip have failed the native library check.

Notes compiling other versions:

2.7.3:

There are a few minor differences here.
There is an additional dependency on FUSE

sudo apt-get install fuse

You no longer need to update pom.xml for the Javadoc issue but the Hadoop patch is still needed.

Leave a Reply

Your email address will not be published. Required fields are marked *