...
Debian Jessie - http://repo.linaro.org/debian/erp-16.12-stable/
CentOS 7 - http://repo.linaro.org/rpm/linaro-staging/centos-7
Installation
For
...
Debian:
Add to repo source list (not required if you are using the installer from the Reference Platform):
...
Code Block | ||
---|---|---|
| ||
$ sudo apt-get update $ sudo apt-get build-dep build-essential |
Check Java version
Code Block | ||
---|---|---|
| ||
java -version |
This should print out OpenJDK8.
...
Install Hadoop, Spark and Hive
Code Block | ||
---|---|---|
| ||
$ sudo apt-get install -ft jessie bigtop-tomcat bigtop-utils hadoop* spark-core zookeeper ^hive-* hbase oozie |
For Centos:
Add to repo source list (not required if you are using the installer from the Reference Platform):
...
Code Block | ||
---|---|---|
| ||
$ sudo yum update $ sudo yum -y install openssh-server openssh-clients java-1.8.0-openjdk* |
Install Hadoop, Spark and Hive
Code Block | ||
---|---|---|
| ||
$ sudo yum install -y hadoop* spark* hive* |
Verifying Hadoop Installation
...
Code Block | ||
---|---|---|
| ||
$ sudo adduser hduser -G hadoop |
give a password for hduser
Code Block | ||
---|---|---|
| ||
$ sudo passwd hduser |
Add hduser to sudoers list:
...
Code Block | ||
---|---|---|
| ||
$ sudo adduser hduser sudo |
On CentOS:
Code Block | ||
---|---|---|
| ||
$ sudo usermod -G wheel hduser |
...
Code Block | ||
---|---|---|
| ||
$ su - hduser |
Generate ssh key for hduser
...
Code Block | ||
---|---|---|
| ||
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys $ chmod 600 $HOME/.ssh/authorized_keys $ chmod 700 $HOME/.ssh |
Test ssh setup
Code Block | ||
---|---|---|
| ||
$ ssh localhost $ exit |
...
Code Block | ||
---|---|---|
| ||
$ sudo sysctl -p |
Configuring the app environment
...
Code Block | ||
---|---|---|
| ||
$ sudo mkdir -p /app/hadoop/tmp $ sudo chown hduser:hadoop /app/hadoop/tmp $ sudo chmod 750 /app/hadoop/tmp $ sudo chown hduser:hadoop /usr/lib/hadoop $ sudo chmod 750 /usr/lib/hadoop |
Setting up Environment Variables
...
Code Block | ||
---|---|---|
| ||
$ vi .bashrc |
Add the following to the end and save:
...
Code Block | ||
---|---|---|
| ||
$ source .bashrc |
Modifying config files
core-site.xml
...
Code Block | ||
---|---|---|
| ||
<property> <name>fs.default.name</name> <value>hdfs://localhost:54310</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem. </description> </property> |
Add this to the bottom before tag: "</configuration>"
Code Block | ||
---|---|---|
| ||
<property> <name>hadoop.tmp.dir</name> <value>/app/hadoop/tmp</value> <description>A base for other temporary directories.</description> </property> |
mapred-site.xml
Code Block | ||
---|---|---|
| ||
$ sudo vi /etc/hadoop/conf/mapred-site.xml |
...
Code Block | ||
---|---|---|
| ||
<property> <name>mapred.job.tracker</name> <value>localhost:54311</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property> |
hdfs-site.xml
Code Block | ||
---|---|---|
| ||
$ sudo vi /etc/hadoop/conf/hdfs-site.xml |
Modify existing property as below:
Code Block | ||
---|---|---|
| ||
<property> <name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property> |
Make sure the following properties are set correctly as below in hdfs-site.xml
Code Block | ||
---|---|---|
| ||
<property> <name>hadoop.tmp.dir</name> <value>/var/lib/hadoop-hdfs/cache/${user.name}</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/var/lib/hadoop-hdfs/cache/${user.name}/dfs/name</value> </property> <property> <name>dfs.namenode.checkpoint.dir</name> <value>/var/lib/hadoop-hdfs/cache/${user.name}/dfs/namesecondary</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/var/lib/hadoop-hdfs/cache/${user.name}/dfs/data</value> </property> |
Make sure the following properties are also present:
Code Block | ||
---|---|---|
| ||
<property> <name>dfs.name.dir</name> <value>/var/lib/hadoop-hdfs/cache/${user.name}/dfs/nn</value> </property> <property> <name>dfs.data.dir</name> <value>/var/lib/hadoop-hdfs/cache/${user.name}/dfs/dn</value> </property> <property> <name>dfs.permissions.supergroup</name> <value>hadoop</value> </property> |
Format Namenode
This step is needed for the first time. Doing it every time will result in loss of content on HDFS.
Code Block | ||
---|---|---|
| ||
$ sudo /etc/init.d/hadoop-hdfs-namenode init |
Start the YARN daemons
Code Block | ||
---|---|---|
| ||
$ for i in hadoop-hdfs-namenode hadoop-hdfs-datanode ; do sudo service $i start ; done $ sudo /etc/init.d/hadoop-yarn-resourcemanager start $ sudo /etc/init.d/hadoop-yarn-nodemanager start |
...
Code Block | ||
---|---|---|
| ||
$ sudo jps |
or
Code Block | ||
---|---|---|
| ||
$ ps aux | grep java |
Alternatively, check if yarn managers are running:
Code Block | ||
---|---|---|
| ||
$ sudo /etc/init.d/hadoop-yarn-resourcemanager status $ sudo /etc/init.d/hadoop-yarn-nodemanager status |
You would see like below:
...
Code Block | ||
---|---|---|
| ||
$ hadoop dfs -copyFromLocal in /in |
Run wordcount example provided
Code Block | ||
---|---|---|
| ||
$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount /in /out |
Check the output
Code Block | ||
---|---|---|
| ||
$ hadoop dfs -cat /out/* |
...
Code Block | ||
---|---|---|
| ||
$ su - hduser |
Configuring Spark
Code Block | ||
---|---|---|
| ||
export HADOOP_HOME=/usr/lib/hadoop export HADOOP_PREFIX=$HADOOP_HOME export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib/native" export HADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec export HADOOP_CONF_DIR=/etc/hadoop/conf export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce export HADOOP_HDFS_HOME=/usr/lib/hadoop-hdfs export YARN_HOME=/usr/lib/hadoop-yarn export HADOOP_YARN_HOME=/usr/lib/hadoop-yarn/ export HADOOP_USER_NAME=hdfs export CLASSPATH=$CLASSPATH:. export CLASSPATH=$CLASSPATH:$HADOOP_HOME/hadoop-common-2.7.2.jar:$HADOOP_HOME/client/hadoop-hdfs-2.7.2.jar:$HADOOP_HOME/hadoop-auth-2.7.2.jar:/usr/lib/hadoop-mapreduce/*:/usr/lib/hive/lib/*:/usr/lib/hadoop/lib/*: export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::") export PATH=/usr/lib/hadoop/libexec:/etc/hadoop/conf:$HADOOP_HOME/bin/:$PATH export SPARK_HOME=/usr/lib/spark export PATH=$HADOOP_HOME\bin:$PATH export SPARK_DIST_CLASSPATH=$HADOOP_HOME\bin\hadoop:$CLASSPATH:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/lib/*:/usr/lib/hadoop-mapreduce/*:. export CLASSPATH=$CLASSPATH:/usr/lib/hadoop/lib/*:. |
...
Code Block | ||
---|---|---|
| ||
$ source .bashrc |
Verifying Spark Installation
Code Block | ||
---|---|---|
| ||
$ $SPARK_HOME/bin/spark-shell --master local[*] |
Running SparkPi Example
...
Code Block | ||
---|---|---|
| ||
$ $SPARK_HOME/bin/run-example SparkPi 100 |
HIVE
Setting up environment for Hive
...
Code Block | ||
---|---|---|
| ||
$ source ~/.bashrc |
Configuring hive
To configure Hive with Hadoop, you need to edit the hive-env.sh file, which is placed in the $HIVE_HOME/conf directory. The following commands redirect to Hive config folder and copy the template file:
Code Block | ||
---|---|---|
| ||
$ cd $HIVE_HOME/conf $ sudo cp hive-env.sh.template hive-env.sh |
Hive installation is completed successfully. Now you require an external database server to configure Metastore. We use Apache Derby database.
...
Code Block | ||
---|---|---|
| ||
$ cd ~ $ wget http://archive.apache.org/dist/db/derby/db-derby-10.4.2.0/db-derby-10.4.2.0-bin.tar.gz |
The following command is used to verify the download:
...
Code Block | ||
---|---|---|
| ||
$ tar zxvf db-derby-10.4.2.0-bin.tar.gz $ ls |
On successful download, you get to see the following response:
...
Code Block | ||
---|---|---|
| ||
$ sudo mv db-derby-10.4.2.0-bin /usr/local/derby |
Setting up Environment for Derby
...
Code Block | ||
---|---|---|
| ||
export DERBY_HOME=/usr/local/derby export PATH=$PATH:$DERBY_HOME/bin export CLASSPATH=$CLASSPATH:$DERBY_HOME/lib/derby.jar:$DERBY_HOME/lib/derbytools.jar |
Code Block | ||
---|---|---|
| ||
$ source ~/.bashrc |
Create a directory to store Metastore
...
Code Block | ||
---|---|---|
| ||
$ sudo mkdir $DERBY_HOME/data |
Derby installation and environmental setup is now complete.
...
Code Block | ||
---|---|---|
| ||
$ cd $HIVE_HOME/conf $ sudo cp hive-default.xml.template hive-site.xml |
Edit hive-site.xml and find entry 'javax.jdo.option.ConnectionURL' and modifiy the value as below:
Code Block | ||
---|---|---|
| ||
<name>hive.exec.scratchdir</name> <value>/tmp/hive-${user.name}</value> <name>hive.exec.local.scratchdir</name> <value>/tmp/${user.name}</value> <name>hive.downloaded.resources.dir</name> <value>/tmp/${user.name}_resources</value> <name>hive.scratch.dir.permission</name> <value>733</value> |
and change the values for the below properties like below:
...
Code Block | ||
---|---|---|
| ||
$ sudo service hive-metastore start $ sudo $HIVE_HOME/bin/metatool -listFSRoot |
Create tmp directory to run Hive under.
Code Block | ||
---|---|---|
| ||
$ cd $HIVE_HOME $ sudo mkdir tmp $ sudo chown hduser tmp $ cd tmp |
The following commands are used to verify Hive installation:
Code Block | ||
---|---|---|
| ||
$ $HIVE_HOME/bin/schematool -dbType derby -initSchema $ hive -hiveconf hive.root.logger=DEBUG,console |
On successful installation of Hive, you get to see the following response:
...
Code Block | ||
---|---|---|
| ||
$ exit $ sudo userdel hduser $ sudo useradd -d /home/hduser -G hadoop -m hduser |
- If Teragen, TeraSort and TeraValidate error out with 'permission denied' exception. The following steps can be done:
Code Block | ||
---|---|---|
| ||
$ sudo groupadd supergroup $ sudo usermod -g supergroup hduser |
- If for some weird reason, if you notice the config files (core-site.xml, hdfs-site.xml, etc) are empty.
...
Code Block | ||
---|---|---|
| ||
$ sudo vi /etc/hosts |
The hosts file should like below:
Code Block | ||
---|---|---|
| ||
127.0.0.1 <hostname> localhost localhost.localdomain #hostname should have the output of $ hostname ::1 localhost |
Also try the following steps:
...
Code Block | ||
---|---|---|
| ||
$ cd $HIVE_HOME/tmp mv metastore_db metastore_db.tmp ../bin/schematool -initSchema -dbType derby |
- If you get the following error with Hive:
...