.Spark 2.0 - Build, configure and Installation steps using Apache BigTop v1.0
- 1 Build Spark 2.0.0 RPM and DEB packages with BigTop ODPi
- 1.1 Introduction
- 1.2 Pre-requisites
- 1.3 Source Location
- 1.4 Files modified
- 1.5 Validating OpenJDK
- 1.6 Installing Pre-Requisites
- 1.6.1 Scala
- 1.6.2 Nodejs
- 1.6.3 Protobuf
- 1.6.4 Maven 3.3.9 (for Debian Jessie only)
- 1.7 Build Procedure
- 2 Installing BigTop ODPi Spark2.0.0 (via RPM)
- 3 SparkPi
Build Spark 2.0.0 RPM and DEB packages with BigTop ODPi
Introduction
This post describes how to generate the RPM and DEB packages for Spark 2.0.0 based on ODPi 1.1.
Apache BigTop 1.1 supports Spark 1.6.2. With Spark 2.0 there are quite a few changes (e.g., .jar file names and folder structures). Modification to BigTop packaging is necessary for building Spark 2.0.
Pre-requisites
For CentOS 7:
sudo yum install -y git wget tar maven ant gcc gcc-c++ make zip unzip rpm-build autoconf automake cppunit-devel hostname svn libtool chrpath fuse-devel fuse cmake lzo-devel java-1.8.0-openjdk lcms2-devel asciidoc xmlto python-devel python-setuptools libxml2-devel libxslt-devel libyaml-devel cyrus-sasl-devel sqlite-devel openldap-devel mysql-devel ivy openssl-devel zlib-devel snappy-devel jansson-develFor Debian Jessie:
sudo apt-get install -y git python gcc g++ make ca-certificates-java ant curl dpkg-dev debhelper devscripts autoconf automake libtool libcppunit-dev chrpath liblzo2-dev libzip-dev sharutils libfuse-dev libssl-dev cmake pkg-config asciidoc xmlto python2.7-dev libxml2-dev libxslt1-dev libsqlite3-dev libldap2-dev libsasl2-dev libmysqlclient-dev python-setuptools libkrb5-dev rsync build-essential zlib1g-dev libsnappy-dev libjansson-dev fuseA recent version of node.js for AArch64 - 4.2.1.
Protobuf 2.5.0
Scala 2.11.1 (2.11.8 is OK too)
OpenJDK 8 (build 1.8.0_111-b15).
Maven 3.3.9
Note: In BigTop ODPi 1.1, the Hadoop version is 2.7.2.
Source Location
https://git.linaro.org/leg/bigdata/bigtop-odpi.git/
Files modified
bigtop-packages/src/common/spark/do-component-build
bigtop-packages/src/common/spark/install_spark.sh
bigtop-packages/src/rpm/spark/SPECS/spark.spec
bigtop.bom
odpi.bom
pom.xml
Validating OpenJDK
Make sure you have the right OpenJDK version
java -versionIt should display 1.8.0_111
Set JAVA_HOME to point to the pertinent JDK.
Installing Pre-Requisites
Scala
wget http://downloads.typesafe.com/scala/2.11.1/scala-2.11.1.tgz
tar xvf scala-2.11.1.tgz
cd scala-2.11.1
export SCALA_HOME=$PWDNodejs
wget https://nodejs.org/dist/v4.2.1/node-v4.2.1.tar.gz
tar xvf node-v4.2.1.tar.gz
cd node-v4.2.1
./configure --prefix=/place/to/install/node
make -j<NUMCORES>
make install
cd /place/to/install/node/bin
export PATH=$PWD:$PATH Protobuf
wget https://github.com/google/protobuf/releases/download/v2.6.1/protobuf-2.6.1.tar.gz
tar xvf protobuf-2.6.1.tar.gz
cd protobuf-2.6.1
./configure --prefix=/place/to/install/protobuf
make -j<NUMCORES>
make install
cd /place/to/install/protobuf/bin
export PATH=$PWD:$PATH
cd /place/to/install/protobuf/lib/pkgconfig
export PKG_CONFIG_PATH=$PWDtype the following to check installation: (it should output 2.6.1)
protoc --versionMaven 3.3.9 (for Debian Jessie only)
wget http://mirror.ox.ac.uk/sites/rsync.apache.org/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz
tar xvf apache-maven-3.3.9-bin.tar.gz
cd apache-maven-3.3.9/bin
export PATH=$PWD:$PATHBuild Procedure
git clone https://git.linaro.org/leg/bigdata/bigtop-odpi.git/
cd bigtop-odpi
Clean up temporary files before making the build.
./gradlew cleanIt might be necessary to delete gradlew and m2 folders if a build was done earlier.
rm -r ~/.gradle
rm -r ~/.m2
./gradlew cleanRun the below command to create rpm packages
./gradlew spark-rpmThe above command generates all the configuration and spec files necessary for the build and then generates the Spark RPM packages.
Once the build is successful, the rpm files will be placed in ./bigtop-packages/src/rpm/spark.
List of Spark RPM files that will be generated:
spark-core-2.0.0-1.el7.centos.noarch.rpm
spark-history-server-2.0.0-1.el7.centos.noarch.rpm
spark-thriftserver-2.0.0-1.el7.centos.noarch.rpm
spark-datanucleus-2.0.0-1.el7.centos.noarch.rpm
spark-master-2.0.0-1.el7.centos.noarch.rpm
spark-worker-2.0.0-1.el7.centos.noarch.rpm
spark-extras-2.0.0-1.el7.centos.noarch.rpm
spark-python-2.0.0-1.el7.centos.noarch.rpm
spark-yarn-shuffle-2.0.0-1.el7.centos.noarch.rpm
./gradlew spark-debThe above command generates all the configuration and spec files necessary for the build and then generates the Spark DEB packages.
NOTE: Spark 2.0.0 by default uses Zinc Server. Zinc Server is optional. If you notice build failures due to Zinc Server, killing the Zinc Server process and re-doing the build will resolve the build failures.
Installing BigTop ODPi Spark2.0.0 (via RPM)
To install
./gradlew spark-rpmEnter the directory storing all the .rpm files of Spark 2.0.0, run the command "yum install *.rpm" with root right. When install Spark 2.0.0, the Hadoop 2.7.2 are required at the same time. Please install all the dependent components.
SparkPi
The command "run-example SparkPi 10" is used to run SparkPi test. Here 10 means run SparkPi test 10 times. Please check the trace/log. If the test passed, there is NO any "error" in the log. Spark will depend on Hadoop libraries, please ensure Hadoop libraries have been added in the environment parameter, SPARK_CLASSPATH.
export SPARK_CLASSPATH=${SPARK_CLASSPATH}:/usr/lib/hadoop/hadoop-common-2.7.2.jar:/usr/lib/hadoop/client/hadoop-hdfs-2.7.2.jar:/usr/lib/spark/jars:/usr/lib/hadoop/hadoop-annotations-2.7.2.jar:/usr/lib/hadoop/hadoop-auth-2.7.2.jar:/usr/lib/hadoop/hadoop-common-2.7.2.jar:/usr/lib/hadoop/hadoop-nfs-2.7.2.jar:/usr/lib/hadoop-hdfs/hadoop-hdfs-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-client-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-common-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-api-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-server-common-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-server-web-proxy-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-app-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-common-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-shuffle-2.7.2.jar:.:/usr/lib/hadoop/hadoop-common-2.7.2.jar:/usr/lib/hadoop/client/hadoop-hdfs-2.7.2.jar:/usr/lib/spark/jars:/usr/lib/hadoop/hadoop-annotations-2.7.2.jar:/usr/lib/hadoop/hadoop-auth-2.7.2.jar:/usr/lib/hadoop/hadoop-common-2.7.2.jar:/usr/lib/hadoop/hadoop-nfs-2.7.2.jar:/usr/lib/hadoop-hdfs/hadoop-hdfs-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-client-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-common-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-api-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-server-common-2.7.2.jar:/usr/lib/hadoop-yarn/hadoop-yarn-server-web-proxy-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-app-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-common-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.7.2.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-shuffle-2.7.2.jar