Spark 2.0 - Build, configure and Installation steps using Apache BigTop
Build Spark 2.0.0 RPM and DEB packages with BigTop ODPi
Introduction
This post describes how to generate the RPM and DEB packages for Spark 2.0.0 based on ODPi 1.1.
Apache BigTop 1.1 supports Spark 1.6.2. With Spark 2.0 there are quite a few changes (e.g., .jar file names and folder structures). Modification to BigTop packaging is necessary for building Spark 2.0.
Pre-requisites
For CentOS 7:
sudo yum install -y git wget tar maven ant gcc gcc-c++ make zip unzip rpm-build autoconf automake cppunit-devel hostname svn libtool chrpath fuse-devel fuse cmake lzo-devel java-1.8.0-openjdk lcms2-devel asciidoc xmlto python-devel python-setuptools libxml2-devel libxslt-devel libyaml-devel cyrus-sasl-devel sqlite-devel openldap-devel mysql-devel ivy openssl-devel zlib-devel snappy-devel jansson-devel
For Debian Jessie:
sudo apt-get install -y git python gcc g++ make ca-certificates-java ant curl dpkg-dev debhelper devscripts autoconf automake libtool libcppunit-dev chrpath liblzo2-dev libzip-dev sharutils libfuse-dev libssl-dev cmake pkg-config asciidoc xmlto python2.7-dev libxml2-dev libxslt1-dev libsqlite3-dev libldap2-dev libsasl2-dev libmysqlclient-dev python-setuptools libkrb5-dev rsync build-essential zlib1g-dev libsnappy-dev libjansson-dev fuse
- A recent version of node.js for AArch64 - 4.2.1.
- Protobuf 2.5.0
- Scala 2.11.1 (2.11.8 is OK too)
- OpenJDK 8 (build 1.8.0_111-b15).
Maven 3.3.9
Note: In BigTop ODPi 1.1, the Hadoop version is 2.7.2.
Source Location
https://git.linaro.org/leg/bigdata/bigtop-odpi.git/
Files modified
bigtop-packages/src/common/spark/do-component-build
bigtop-packages/src/common/spark/install_spark.sh
- bigtop-packages/src/rpm/spark/SPECS/spark.spec
- bigtop.bom
- odpi.bom
- pom.xml
Validating OpenJDK
Make sure you have the right OpenJDK version
java -version
It should display 1.8.0_111
Set JAVA_HOME to point to the pertinent JDK.
Installing Pre-Requisites
Scala
wget http://downloads.typesafe.com/scala/2.11.1/scala-2.11.1.tgz tar xvf scala-2.11.1.tgz cd scala-2.11.1 export SCALA_HOME=$PWD
Nodejs
wget https://nodejs.org/dist/v4.2.1/node-v4.2.1.tar.gz tar xvf node-v4.2.1.tar.gz cd node-v4.2.1 ./configure --prefix=/place/to/install/node make -j<NUMCORES> make install cd /place/to/install/node/bin export PATH=$PWD:$PATH
Protobuf
wget https://github.com/google/protobuf/releases/download/v2.6.1/protobuf-2.6.1.tar.gz tar xvf protobuf-2.6.1.tar.gz cd protobuf-2.6.1 ./configure --prefix=/place/to/install/protobuf make -j<NUMCORES> make install cd /place/to/install/protobuf/bin export PATH=$PWD:$PATH cd /place/to/install/protobuf/lib/pkgconfig export PKG_CONFIG_PATH=$PWD
type the following to check installation: (it should output 2.6.1)
protoc --version
Maven 3.3.9 (for Debian Jessie only)
wget http://mirror.ox.ac.uk/sites/rsync.apache.org/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz tar xvf apache-maven-3.3.9-bin.tar.gz cd apache-maven-3.3.9/bin export PATH=$PWD:$PATH
Build Procedure
git clone https://git.linaro.org/leg/bigdata/bigtop-odpi.git/ cd bigtop-odpi
Clean up temporary files before making the build.
./gradlew clean
It might be necessary to delete gradlew and m2 folders if a build was done earlier.
rm -r ~/.gradle rm -r ~/.m2 ./gradlew clean
Run the below command to create rpm packages
./gradlew spark-rpm
The above command generates all the configuration and spec files necessary for the build and then generates the Spark RPM packages.
Once the build is successful, the rpm files will be placed in ./bigtop-packages/src/rpm/spark.
List of Spark RPM files that will be generated:
- spark-core-2.0.0-1.el7.centos.noarch.rpm
- spark-history-server-2.0.0-1.el7.centos.noarch.rpm
- spark-thriftserver-2.0.0-1.el7.centos.noarch.rpm
- spark-datanucleus-2.0.0-1.el7.centos.noarch.rpm
- spark-master-2.0.0-1.el7.centos.noarch.rpm
- spark-worker-2.0.0-1.el7.centos.noarch.rpm
- spark-extras-2.0.0-1.el7.centos.noarch.rpm
- spark-python-2.0.0-1.el7.centos.noarch.rpm
- spark-yarn-shuffle-2.0.0-1.el7.centos.noarch.rpm
./gradlew spark-deb
The above command generates all the configuration and spec files necessary for the build and then generates the Spark DEB packages.
NOTE: Spark 2.0.0 by default uses Zinc Server. Zinc Server is optional. If you notice build failures due to Zinc Server, killing the Zinc Server process and re-doing the build will resolve the build failures.
Installing BigTop ODPi Spark2.0.0 (via RPM)
To install
./gradlew spark-rpm
Enter the directory storing all the .rpm files of Spark 2.0.0, run the command "yum install *.rpm" with root right. When install Spark 2.0.0, the Hadoop 2.7.2 are required at the same time. Please install all the dependent components.
SparkPi
The command "run-example SparkPi 10" is used to run SparkPi test. Here 10 means run SparkPi test 10 times. Please check the trace/log. If the test passed, there is NO any "error" in the log. Spark will depend on Hadoop libraries, please ensure Hadoop libraries have been added in the environment parameter, SPARK_CLASSPATH.
export SPARK_CLASSPATH=${SPARK_CLASSPATH}:/usr/lib/hadoop/hadoop-common-2.7.2.jar:/usr/lib/hadoop/client/hadoop-hdfs-2.7.2.jar:/usr/lib/spark/jars:/usr/lib/hadoop/hadoop-annotations-2.7.2.jar:/usr/lib/hadoop/hadoop-auth-2.7.2.jar:/usr/lib/hadoop/hadoop-common-2.7.2.jar:/usr/lib/hadoop/hadoop-nfs-2.7.2.jar:/usr/lib/hadoop-hdfs/hadoop-hdfs-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-client-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-common-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-api-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-server-common-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-server-web-proxy-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-app-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-common-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-shuffle-2.7.2.jar:.:/usr/lib/hadoop/hadoop-common-2.7.2.jar:/usr/lib/hadoop/client/hadoop-hdfs-2.7.2.jar:/usr/lib/spark/jars:/usr/lib/hadoop/hadoop-annotations-2.7.2.jar:/usr/lib/hadoop/hadoop-auth-2.7.2.jar:/usr/lib/hadoop/hadoop-common-2.7.2.jar:/usr/lib/hadoop/hadoop-nfs-2.7.2.jar:/usr/lib/hadoop-hdfs/hadoop-hdfs-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-client-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-common-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-api-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-server-common-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-server-web-proxy-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-app-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-common-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-shuffle-2.7.2.jar