Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Installation

For

...

Debian:

Add to repo source list (not required if you are using the installer from the Reference Platform): 

...

 

Code Block
languagebash
$ sudo apt-get update
$ sudo apt-get build-dep build-essential

 

 

Check Java version 

 

Code Block
languagebash
java -version

 

This should print out OpenJDK8.

...

Install Hadoop, Spark and Hive

 

Code Block
languagebash
$ sudo apt-get install -ft jessie bigtop-tomcat bigtop-utils hadoop* spark-core zookeeper ^hive-* hbase oozie

 

For Centos:

Add to repo source list (not required if you are using the installer from the Reference Platform): 

...

 

Code Block
languagebash
$ sudo yum update
$ sudo yum -y install openssh-server openssh-clients java-1.8.0-openjdk*

 

 

Install Hadoop, Spark and Hive

 

Code Block
languagebash
$ sudo yum install -y hadoop* spark* hive*

 

Verifying Hadoop Installation

...

 

Code Block
languagebash
$ sudo adduser hduser -G hadoop

 

give a password for hduser 

 

Code Block
languagebash
$ sudo passwd hduser

 

Add hduser to sudoers list:  

...

 

Code Block
languagebash
$ sudo adduser hduser sudo

 

On CentOS: 

Code Block
languagebash
$ sudo usermod -G wheel hduser

...

 

Code Block
languagebash
$ su - hduser

 

Generate ssh key for hduser

...

 

Code Block
languagebash
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
$ chmod 600 $HOME/.ssh/authorized_keys
$ chmod 700 $HOME/.ssh 

 

Test ssh setup

Code Block
languagebash
$ ssh localhost
$ exit

...

 

Code Block
languagebash
$ sudo sysctl -p

 

Configuring the app environment

...

 

Code Block
languagebash
$ sudo mkdir -p /app/hadoop/tmp
$ sudo chown hduser:hadoop /app/hadoop/tmp
$ sudo chmod 750 /app/hadoop/tmp
$ sudo chown hduser:hadoop /usr/lib/hadoop
$ sudo chmod 750 /usr/lib/hadoop

 

Setting up Environment Variables

...

 

Code Block
languagebash
$ vi .bashrc

 

Add the following to the end and save:  

...


Code Block
languagebash
$ source .bashrc

 

Modifying config files

core-site.xml

...

 

Code Block
languagetext
<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
  <description>The name of the default file system. A URI whose
 scheme and authority determine the FileSystem implementation. The
 uri's scheme determines the config property (fs.SCHEME.impl) naming
 the FileSystem implementation class. The uri's authority is used to
 determine the host, port, etc. for a filesystem.
  </description>
</property>

 

Add this to the bottom before tag:  "</configuration>"  

 

Code Block
languagetext
<property>
  <name>hadoop.tmp.dir</name>
  <value>/app/hadoop/tmp</value>
  <description>A base for other temporary directories.</description>
</property>

 

mapred-site.xml

Code Block
languagebash
$ sudo vi /etc/hadoop/conf/mapred-site.xml

...

 

Code Block
languagetext
<property>
  <name>mapred.job.tracker</name>
  <value>localhost:54311</value>
  <description>The host and port that the MapReduce job tracker runs
 at. If "local", then jobs are run in-process as a single map
 and reduce task.
  </description>
</property>

 

hdfs-site.xml

 

Code Block
languagebash
$ sudo vi /etc/hadoop/conf/hdfs-site.xml

 

Modify existing property as below: 

 

Code Block
languagebash
<property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.
  </description>
</property>

 

  

Make sure the following properties are set correctly as below in hdfs-site.xml  

 

Code Block
languagetext
<property>
  <name>hadoop.tmp.dir</name>
  <value>/var/lib/hadoop-hdfs/cache/${user.name}</value>
</property>


<property>
  <name>dfs.namenode.name.dir</name>
  <value>/var/lib/hadoop-hdfs/cache/${user.name}/dfs/name</value>
</property>
 
<property>
  <name>dfs.namenode.checkpoint.dir</name>
  <value>/var/lib/hadoop-hdfs/cache/${user.name}/dfs/namesecondary</value>
</property>
 
<property>
  <name>dfs.datanode.data.dir</name>
  <value>/var/lib/hadoop-hdfs/cache/${user.name}/dfs/data</value>
</property>

 

  

Make sure the following properties are also present:

 

Code Block
languagetext
<property>
  <name>dfs.name.dir</name>
  <value>/var/lib/hadoop-hdfs/cache/${user.name}/dfs/nn</value>
</property>
 
<property>
  <name>dfs.data.dir</name>
  <value>/var/lib/hadoop-hdfs/cache/${user.name}/dfs/dn</value>
</property>


<property>
  <name>dfs.permissions.supergroup</name>
  <value>hadoop</value>
</property>

 

Format Namenode

This step is needed for the first time. Doing it every time will result in loss of content on HDFS. 

 

Code Block
languagebash
$ sudo /etc/init.d/hadoop-hdfs-namenode init

 

Start the YARN daemons

Code Block
languagebash
$ for i in hadoop-hdfs-namenode hadoop-hdfs-datanode ; do sudo service $i start ; done
 
$ sudo /etc/init.d/hadoop-yarn-resourcemanager start
$ sudo /etc/init.d/hadoop-yarn-nodemanager start

...

 

Code Block
languagebash
$ sudo jps  

 

or  

 

Code Block
languagebash
$ ps aux | grep java

 

  

Alternatively, check if yarn managers are running: 

 

Code Block
languagebash
$ sudo /etc/init.d/hadoop-yarn-resourcemanager status
$ sudo /etc/init.d/hadoop-yarn-nodemanager status

 

You would see like below:  

...

 

Code Block
languagebash
$ hadoop dfs -copyFromLocal in /in  

 

Run wordcount example provided

 

Code Block
languagebash
$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount /in /out  

 

Check the output

Code Block
languagebash
$ hadoop dfs -cat /out/*

...

 

Code Block
languagebash
$ su - hduser

 

Configuring Spark

Code Block
languagebash
 export HADOOP_HOME=/usr/lib/hadoop 
 export HADOOP_PREFIX=$HADOOP_HOME 
 export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib/native" 
 export HADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec 
 export HADOOP_CONF_DIR=/etc/hadoop/conf
 export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native 
 export HADOOP_COMMON_HOME=$HADOOP_HOME 
 export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce 
 export HADOOP_HDFS_HOME=/usr/lib/hadoop-hdfs 
 export YARN_HOME=/usr/lib/hadoop-yarn 
 export HADOOP_YARN_HOME=/usr/lib/hadoop-yarn/ 
 export HADOOP_USER_NAME=hdfs 
 export CLASSPATH=$CLASSPATH:. 
 export CLASSPATH=$CLASSPATH:$HADOOP_HOME/hadoop-common-2.7.2.jar:$HADOOP_HOME/client/hadoop-hdfs-2.7.2.jar:$HADOOP_HOME/hadoop-auth-2.7.2.jar:/usr/lib/hadoop-mapreduce/*:/usr/lib/hive/lib/*:/usr/lib/hadoop/lib/*: 
 export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::") 
 export PATH=/usr/lib/hadoop/libexec:/etc/hadoop/conf:$HADOOP_HOME/bin/:$PATH 
 export SPARK_HOME=/usr/lib/spark 
 export PATH=$HADOOP_HOME\bin:$PATH 
 export SPARK_DIST_CLASSPATH=$HADOOP_HOME\bin\hadoop:$CLASSPATH:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/lib/*:/usr/lib/hadoop-mapreduce/*:. 
 export CLASSPATH=$CLASSPATH:/usr/lib/hadoop/lib/*:.

...

 

Code Block
languagebash
 $ source .bashrc

 

Verifying Spark Installation

 

Code Block
languagebash
$ $SPARK_HOME/bin/spark-shell --master local[*]

 

Running SparkPi Example  

...

 

Code Block
languagebash
$ $SPARK_HOME/bin/run-example SparkPi 100

 

HIVE

Setting up environment for Hive 

...

 

Code Block
languagebash
$ source ~/.bashrc

 

Configuring hive

To configure Hive with Hadoop, you need to edit the hive-env.sh file, which is placed in the $HIVE_HOME/conf directory. The following commands redirect to Hive config folder and copy the template file:  

 

Code Block
languagebash
$ cd $HIVE_HOME/conf
$ sudo cp hive-env.sh.template hive-env.sh   

 

Hive installation is completed successfully. Now you require an external database server to configure Metastore. We use Apache Derby database.  

...

 

Code Block
languagebash
$ cd ~
$ wget http://archive.apache.org/dist/db/derby/db-derby-10.4.2.0/db-derby-10.4.2.0-bin.tar.gz

 

 

The following command is used to verify the download:  

...

 

Code Block
languagebash
$ tar zxvf db-derby-10.4.2.0-bin.tar.gz
$ ls

 

  

On successful download, you get to see the following response:  

...

 

Code Block
languagebash
$ sudo mv db-derby-10.4.2.0-bin /usr/local/derby

 

Setting up Environment for Derby

...

 

Code Block
languagebash
 export DERBY_HOME=/usr/local/derby
 export PATH=$PATH:$DERBY_HOME/bin
 export CLASSPATH=$CLASSPATH:$DERBY_HOME/lib/derby.jar:$DERBY_HOME/lib/derbytools.jar


Code Block
languagebash
$ source ~/.bashrc

 

Create a directory to store Metastore

...

 

Code Block
languagebash
$ sudo mkdir $DERBY_HOME/data

 

Derby installation and environmental setup is now complete.  

...

 

Code Block
languagebash
$ cd $HIVE_HOME/conf
$ sudo cp hive-default.xml.template hive-site.xml

 

Edit hive-site.xml and find entry 'javax.jdo.option.ConnectionURL' and modifiy the value as below: 

 

Code Block
languagetext
<name>hive.exec.scratchdir</name>
<value>/tmp/hive-${user.name}</value>
 
<name>hive.exec.local.scratchdir</name>
<value>/tmp/${user.name}</value>
 
<name>hive.downloaded.resources.dir</name>
<value>/tmp/${user.name}_resources</value>
 
<name>hive.scratch.dir.permission</name>
<value>733</value>

 

and change the values for the below properties like below:  

...

 

Code Block
languagebash
$ sudo service hive-metastore start
$ sudo $HIVE_HOME/bin/metatool -listFSRoot

 

Create tmp directory to run Hive under.  

 

Code Block
languagebash
$ cd $HIVE_HOME
$ sudo mkdir tmp
$ sudo chown hduser tmp
$ cd tmp

 

The following commands are used to verify Hive installation:  

 

Code Block
languagebash
$ $HIVE_HOME/bin/schematool -dbType derby -initSchema
$ hive -hiveconf hive.root.logger=DEBUG,console

 

On successful installation of Hive, you get to see the following response:  

...

 

Code Block
languagebash
$ exit
$ sudo userdel hduser
$ sudo useradd -d /home/hduser -G hadoop -m hduser

 

  • If Teragen, TeraSort and TeraValidate error out with 'permission denied' exception. The following steps can be done: 

 

Code Block
languagebash
$ sudo groupadd supergroup
$ sudo usermod -g supergroup hduser

 

  • If for some weird reason, if you notice the config files (core-site.xml, hdfs-site.xml, etc) are empty.  

...

 

Code Block
languagebash
$ sudo vi /etc/hosts

 

The hosts file should like below:  

 

Code Block
languagetext
127.0.0.1 <hostname> localhost localhost.localdomain #hostname should have the output of $ hostname

::1 localhost

 

Also try the following steps:  

...

 

Code Block
languagebash
$ cd $HIVE_HOME/tmp
mv metastore_db metastore_db.tmp
../bin/schematool -initSchema -dbType derby

 

   

  • If you get the following error with Hive:  

...