Spark 2.0 - Build, configure and Installation steps using Apache BigTop

Build Spark 2.0.0 RPM and DEB packages with BigTop ODPi

Introduction

This post describes how to generate the RPM and DEB packages for Spark 2.0.0 based on ODPi 1.1.

Apache BigTop 1.1 supports Spark 1.6.2. With Spark 2.0 there are quite a few changes (e.g., .jar file names and folder structures). Modification to BigTop packaging is necessary for building Spark 2.0. 

Pre-requisites

  • For CentOS 7: 

    sudo yum install -y git wget tar maven ant gcc gcc-c++ make zip unzip rpm-build autoconf automake cppunit-devel hostname svn libtool chrpath fuse-devel fuse cmake lzo-devel java-1.8.0-openjdk lcms2-devel asciidoc xmlto python-devel python-setuptools libxml2-devel libxslt-devel libyaml-devel cyrus-sasl-devel sqlite-devel openldap-devel mysql-devel ivy openssl-devel zlib-devel snappy-devel jansson-devel  


  • For Debian Jessie: 

    sudo apt-get install -y git python gcc g++ make ca-certificates-java ant curl dpkg-dev debhelper devscripts autoconf automake libtool libcppunit-dev chrpath liblzo2-dev libzip-dev sharutils libfuse-dev libssl-dev cmake pkg-config asciidoc xmlto python2.7-dev libxml2-dev libxslt1-dev libsqlite3-dev libldap2-dev libsasl2-dev libmysqlclient-dev python-setuptools libkrb5-dev rsync build-essential zlib1g-dev libsnappy-dev libjansson-dev fuse
  • A recent version of node.js for AArch64 - 4.2.1. 
  • Protobuf 2.5.0 
  • Scala 2.11.1 (2.11.8 is OK too)
  • OpenJDK 8 (build 1.8.0_111-b15).
  • Maven 3.3.9

Note: In BigTop ODPi 1.1, the Hadoop version is 2.7.2.

Source Location

https://git.linaro.org/leg/bigdata/bigtop-odpi.git/

Files modified

  • bigtop-packages/src/common/spark/do-component-build

  • bigtop-packages/src/common/spark/install_spark.sh

  • bigtop-packages/src/rpm/spark/SPECS/spark.spec
  • bigtop.bom
  • odpi.bom
  • pom.xml

Validating OpenJDK

Make sure you have the right OpenJDK version 

java -version

It should display 1.8.0_111

Set JAVA_HOME to point to the pertinent JDK. 

Installing Pre-Requisites

Scala

wget http://downloads.typesafe.com/scala/2.11.1/scala-2.11.1.tgz 
tar xvf scala-2.11.1.tgz 
cd scala-2.11.1 
export SCALA_HOME=$PWD

Nodejs

wget https://nodejs.org/dist/v4.2.1/node-v4.2.1.tar.gz 
tar xvf node-v4.2.1.tar.gz 
cd node-v4.2.1 
./configure --prefix=/place/to/install/node 
make -j<NUMCORES> 
make install 
cd /place/to/install/node/bin 
export PATH=$PWD:$PATH 

Protobuf


wget https://github.com/google/protobuf/releases/download/v2.6.1/protobuf-2.6.1.tar.gz
tar xvf protobuf-2.6.1.tar.gz
cd protobuf-2.6.1
./configure --prefix=/place/to/install/protobuf
make -j<NUMCORES>
make install
cd /place/to/install/protobuf/bin
export PATH=$PWD:$PATH
cd /place/to/install/protobuf/lib/pkgconfig
export PKG_CONFIG_PATH=$PWD

type the following to check installation: (it should output 2.6.1) 

protoc --version

Maven 3.3.9 (for Debian Jessie only)

wget http://mirror.ox.ac.uk/sites/rsync.apache.org/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz 
tar xvf apache-maven-3.3.9-bin.tar.gz 
cd apache-maven-3.3.9/bin 
export PATH=$PWD:$PATH

Build Procedure

git clone https://git.linaro.org/leg/bigdata/bigtop-odpi.git/
cd bigtop-odpi


Clean up temporary files before making the build.

./gradlew clean

It might be necessary to delete gradlew and m2 folders if a build was done earlier.

rm -r ~/.gradle
rm -r ~/.m2
./gradlew clean

Run the below command to create rpm packages

./gradlew spark-rpm

The above command generates all the configuration and spec files necessary for the build and then generates the Spark RPM packages.

Once the build is successful, the rpm files will be placed in ./bigtop-packages/src/rpm/spark. 

List of Spark RPM files that will be generated:

  • spark-core-2.0.0-1.el7.centos.noarch.rpm
  • spark-history-server-2.0.0-1.el7.centos.noarch.rpm
  • spark-thriftserver-2.0.0-1.el7.centos.noarch.rpm
  • spark-datanucleus-2.0.0-1.el7.centos.noarch.rpm
  • spark-master-2.0.0-1.el7.centos.noarch.rpm
  • spark-worker-2.0.0-1.el7.centos.noarch.rpm
  • spark-extras-2.0.0-1.el7.centos.noarch.rpm
  • spark-python-2.0.0-1.el7.centos.noarch.rpm 
  • spark-yarn-shuffle-2.0.0-1.el7.centos.noarch.rpm

./gradlew spark-deb

The above command generates all the configuration and spec files necessary for the build and then generates the Spark DEB packages.


NOTE: Spark 2.0.0 by default uses Zinc Server. Zinc Server is optional. If you notice build failures due to Zinc Server, killing the Zinc Server process and re-doing the build will resolve the build failures.

 


 

Installing BigTop ODPi Spark2.0.0 (via RPM)

To install 

./gradlew spark-rpm

Enter the directory storing all the .rpm files of Spark 2.0.0, run the command "yum install *.rpm" with root right. When install Spark 2.0.0, the Hadoop 2.7.2 are required at the same time. Please install all the dependent components.

SparkPi

The command "run-example SparkPi 10" is used to run SparkPi test. Here 10 means run SparkPi test 10 times. Please check the trace/log. If the test passed, there is NO any "error" in the log. Spark will depend on Hadoop libraries, please ensure Hadoop libraries have been added in the environment parameter, SPARK_CLASSPATH.

 

export SPARK_CLASSPATH=${SPARK_CLASSPATH}:/usr/lib/hadoop/hadoop-common-2.7.2.jar:/usr/lib/hadoop/client/hadoop-hdfs-2.7.2.jar:/usr/lib/spark/jars:/usr/lib/hadoop/hadoop-annotations-2.7.2.jar:/usr/lib/hadoop/hadoop-auth-2.7.2.jar:/usr/lib/hadoop/hadoop-common-2.7.2.jar:/usr/lib/hadoop/hadoop-nfs-2.7.2.jar:/usr/lib/hadoop-hdfs/hadoop-hdfs-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-client-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-common-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-api-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-server-common-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-server-web-proxy-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-app-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-common-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-shuffle-2.7.2.jar:.:/usr/lib/hadoop/hadoop-common-2.7.2.jar:/usr/lib/hadoop/client/hadoop-hdfs-2.7.2.jar:/usr/lib/spark/jars:/usr/lib/hadoop/hadoop-annotations-2.7.2.jar:/usr/lib/hadoop/hadoop-auth-2.7.2.jar:/usr/lib/hadoop/hadoop-common-2.7.2.jar:/usr/lib/hadoop/hadoop-nfs-2.7.2.jar:/usr/lib/hadoop-hdfs/hadoop-hdfs-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-client-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-common-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-api-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-server-common-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-server-web-proxy-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-app-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-common-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-shuffle-2.7.2.jar