HiBench

References

https://github.com/intel-hadoop/HiBench/wiki/Getting-Started

Build and Install

$ git clone https://github.com/intel-hadoop/HiBench.git
$ cd src
$ mvn clean package -D spark1.6.1 -D MR2 # Changed the spark version from default
$ cd conf
$ cp 99-user_defined_properties.conf.template 99-user_defined_properties.conf

Change the following config params:

hibench.hadoop.home <Hadoop installation location>
hibench.spark.home <Spark installation location>
hibench.hdfs.master hdfs://<host>:8020
hibench.spark.master spark://<host>:7077
hibench.hadoop.version hadoop2 - # Change this in addition to the above configuration as hibench was not able to detect the hadoop version

Errors and Workarounds

Running WordCount Workload

$ workloads/wordcount/prepare/prepare.sh

_{'''Error 1:''' certain environment variables not found}
_{workloads/wordcount/spark/scala/bin/run.sh}

Traceback (most recent call last):
File "/home/nbhoyar/HiBench/bin/functions/load-config.py", line 556, in <module>
load_config(conf_root, workload_root, workload_folder, patching_config)
File "/home/nbhoyar/HiBench/bin/functions/load-config.py", line 161, in load_config
generate_optional_value()
File "/home/nbhoyar/HiBench/bin/functions/load-config.py", line 374, in generate_optional_value
HibenchConf["hibench.hadoop.examples.test.jar"] = OneAndOnlyOneFile(HibenchConf['hibench.hadoop.mapreduce.home'] + "/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient*-tests.jar")
File "/home/nbhoyar/HiBench/bin/functions/load-config.py", line 114, in OneAndOnlyOneFile
raise Exception("Need to match one and only one file!")
Exception: Need to match one and only one file!
/home/nbhoyar/HiBench/bin/functions/workload-functions.sh: line 39: .: filename argument required
.: usage: . filename [arguments]

'''Solution:''' Modified line 358 and 374 in bin/functions/load-config.py to reflect the correct path of ODPi Hadoop's example jar file

_{'''Error 2:''' Traceback (most recent call last):}

File "/home/nbhoyar/HiBench/bin/functions/load-config.py", line 556, in <module>
load_config(conf_root, workload_root, workload_folder, patching_config)
File "/home/nbhoyar/HiBench/bin/functions/load-config.py", line 161, in load_config
generate_optional_value()
File "/home/nbhoyar/HiBench/bin/functions/load-config.py", line 434, in generate_optional_value
assert 0, "Get workers from spark master's web UI page failed, reason:%s\nPlease check your configurations, network settings, proxy settings, or set `hibench.masters.hostnames` and `hibench.slaves.hostnames` manually to bypass auto-probe" % e
AssertionError: Get workers from spark master's web UI page failed, reason:[Errno socket error] [Errno 111] Connection refused
Please check your configurations, network settings, proxy settings, or set `hibench.masters.hostnames` and `hibench.slaves.hostnames` manually to bypass auto-probe
/home/nbhoyar/HiBench/bin/functions/workload-functions.sh: line 39: .: filename argument required

'''Solution:''' Started spark (start-all and history server)

_{'''Error 3:''' /home/nbhoyar/HiBench/bin/functions/workload-functions.sh: line 113: $1: unbound variable}

'''Solution:''' Working on it. Posted it on: https://github.com/intel-hadoop/HiBench/issues/279