Build Spark 2.0.0 RPM and DEB packages with BigTop ODPi

Introduction

This post describes how to generate the RPM and DEB packages for Spark 2.0.0 based on ODPi 1.1.

Apache BigTop 1.1 supports Spark 1.6.2. With Spark 2.0 there are quite a few changes (e.g., .jar file names and folder structures). Modification to BigTop packaging is necessary for building Spark 2.0. 

Pre-requisites

Note: In BigTop ODPi 1.1, the Hadoop version is 2.7.2.

Source Location

https://git.linaro.org/leg/bigdata/bigtop-odpi.git/

Files modified

Validating OpenJDK

Make sure you have the right OpenJDK version 

java -version

It should display 1.8.0_111

Set JAVA_HOME to point to the pertinent JDK. 

Installing Pre-Requisites

Scala

wget http://downloads.typesafe.com/scala/2.11.1/scala-2.11.1.tgz 
tar xvf scala-2.11.1.tgz 
cd scala-2.11.1 
export SCALA_HOME=$PWD

Nodejs

wget https://nodejs.org/dist/v4.2.1/node-v4.2.1.tar.gz 
tar xvf node-v4.2.1.tar.gz 
cd node-v4.2.1 
./configure --prefix=/place/to/install/node 
make -j<NUMCORES> 
make install 
cd /place/to/install/node/bin 
export PATH=$PWD:$PATH 

Protobuf


wget https://github.com/google/protobuf/releases/download/v2.6.1/protobuf-2.6.1.tar.gz
tar xvf protobuf-2.6.1.tar.gz
cd protobuf-2.6.1
./configure --prefix=/place/to/install/protobuf
make -j<NUMCORES>
make install
cd /place/to/install/protobuf/bin
export PATH=$PWD:$PATH
cd /place/to/install/protobuf/lib/pkgconfig
export PKG_CONFIG_PATH=$PWD

type the following to check installation: (it should output 2.6.1) 

protoc --version

Maven 3.3.9 (for Debian Jessie only)

wget http://mirror.ox.ac.uk/sites/rsync.apache.org/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz 
tar xvf apache-maven-3.3.9-bin.tar.gz 
cd apache-maven-3.3.9/bin 
export PATH=$PWD:$PATH

Build Procedure

git clone https://git.linaro.org/leg/bigdata/bigtop-odpi.git/
cd bigtop-odpi


Clean up temporary files before making the build.

./gradlew clean

It might be necessary to delete gradlew and m2 folders if a build was done earlier.

rm -r ~/.gradle
rm -r ~/.m2
./gradlew clean

Run the below command to create rpm packages

./gradlew spark-rpm

The above command generates all the configuration and spec files necessary for the build and then generates the Spark RPM packages.

Once the build is successful, the rpm files will be placed in ./bigtop-packages/src/rpm/spark. 

List of Spark RPM files that will be generated:

./gradlew spark-deb

The above command generates all the configuration and spec files necessary for the build and then generates the Spark DEB packages.


NOTE: Spark 2.0.0 by default uses Zinc Server. Zinc Server is optional. If you notice build failures due to Zinc Server, killing the Zinc Server process and re-doing the build will resolve the build failures.

 


 

Installing BigTop ODPi Spark2.0.0 (via RPM)

To install 

./gradlew spark-rpm

Enter the directory storing all the .rpm files of Spark 2.0.0, run the command "yum install *.rpm" with root right. When install Spark 2.0.0, the Hadoop 2.7.2 are required at the same time. Please install all the dependent components.

SparkPi

The command "run-example SparkPi 10" is used to run SparkPi test. Here 10 means run SparkPi test 10 times. Please check the trace/log. If the test passed, there is NO any "error" in the log. Spark will depend on Hadoop libraries, please ensure Hadoop libraries have been added in the environment parameter, SPARK_CLASSPATH.

 

export SPARK_CLASSPATH=${SPARK_CLASSPATH}:/usr/lib/hadoop/hadoop-common-2.7.2.jar:/usr/lib/hadoop/client/hadoop-hdfs-2.7.2.jar:/usr/lib/spark/jars:/usr/lib/hadoop/hadoop-annotations-2.7.2.jar:/usr/lib/hadoop/hadoop-auth-2.7.2.jar:/usr/lib/hadoop/hadoop-common-2.7.2.jar:/usr/lib/hadoop/hadoop-nfs-2.7.2.jar:/usr/lib/hadoop-hdfs/hadoop-hdfs-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-client-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-common-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-api-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-server-common-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-server-web-proxy-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-app-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-common-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-shuffle-2.7.2.jar:.:/usr/lib/hadoop/hadoop-common-2.7.2.jar:/usr/lib/hadoop/client/hadoop-hdfs-2.7.2.jar:/usr/lib/spark/jars:/usr/lib/hadoop/hadoop-annotations-2.7.2.jar:/usr/lib/hadoop/hadoop-auth-2.7.2.jar:/usr/lib/hadoop/hadoop-common-2.7.2.jar:/usr/lib/hadoop/hadoop-nfs-2.7.2.jar:/usr/lib/hadoop-hdfs/hadoop-hdfs-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-client-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-common-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-api-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-server-common-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-server-web-proxy-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-app-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-common-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-shuffle-2.7.2.jar