References
https://github.com/intel-hadoop/HiBench/wiki/Getting-Started
Build and Install
$ git clone https://github.com/intel-hadoop/HiBench.git $ cd src $ mvn clean package -D spark1.6.1 -D MR2 # Changed the spark version from default $ cd conf $ cp 99-user_defined_properties.conf.template 99-user_defined_properties.conf
Change the following config params:
hibench.hadoop.home <Hadoop installation location> hibench.spark.home <Spark installation location> hibench.hdfs.master hdfs://<host>:8020 hibench.spark.master spark://<host>:7077 hibench.hadoop.version hadoop2 - # Change this in addition to the above configuration as hibench was not able to detect the hadoop version
Errors and Workarounds
Running WordCount Workload
$ workloads/wordcount/prepare/prepare.sh
'''Error 1:''' certain environment variables not found
workloads/wordcount/spark/scala/bin/run.sh
Traceback (most recent call last):
File "/home/nbhoyar/HiBench/bin/functions/load-config.py", line 556, in <module>
load_config(conf_root, workload_root, workload_folder, patching_config)
File "/home/nbhoyar/HiBench/bin/functions/load-config.py", line 161, in load_config
generate_optional_value()
File "/home/nbhoyar/HiBench/bin/functions/load-config.py", line 374, in generate_optional_value
HibenchConf["hibench.hadoop.examples.test.jar"] = OneAndOnlyOneFile(HibenchConf['hibench.hadoop.mapreduce.home'] + "/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient*-tests.jar")
File "/home/nbhoyar/HiBench/bin/functions/load-config.py", line 114, in OneAndOnlyOneFile
raise Exception("Need to match one and only one file!")
Exception: Need to match one and only one file!
/home/nbhoyar/HiBench/bin/functions/workload-functions.sh: line 39: .: filename argument required
.: usage: . filename [arguments]
'''Solution:''' Modified line 358 and 374 in bin/functions/load-config.py to reflect the correct path of ODPi Hadoop's example jar file
'''Error 2:''' Traceback (most recent call last):
File "/home/nbhoyar/HiBench/bin/functions/load-config.py", line 556, in <module>
load_config(conf_root, workload_root, workload_folder, patching_config)
File "/home/nbhoyar/HiBench/bin/functions/load-config.py", line 161, in load_config
generate_optional_value()
File "/home/nbhoyar/HiBench/bin/functions/load-config.py", line 434, in generate_optional_value
assert 0, "Get workers from spark master's web UI page failed, reason:%s\nPlease check your configurations, network settings, proxy settings, or set `hibench.masters.hostnames` and `hibench.slaves.hostnames` manually to bypass auto-probe" % e
AssertionError: Get workers from spark master's web UI page failed, reason:[Errno socket error] [Errno 111] Connection refused
Please check your configurations, network settings, proxy settings, or set `hibench.masters.hostnames` and `hibench.slaves.hostnames` manually to bypass auto-probe
/home/nbhoyar/HiBench/bin/functions/workload-functions.sh: line 39: .: filename argument required
'''Solution:''' Started spark (start-all and history server)
'''Error 3:''' /home/nbhoyar/HiBench/bin/functions/workload-functions.sh: line 113: $1: unbound variable
'''Solution:''' Working on it. Posted it on: https://github.com/intel-hadoop/HiBench/issues/279