Spark Bench

Build and Install

1. git clone https://github.com/SparkTC/spark-bench.git

2. Follow steps in Getting Started guide on the following link to install and configure spark-bench: https://github.com/SparkTC/spark-bench/

3. Make sure Hadoop and Spark are installed and running. Note: ODPi Hadoop won't work as it's file structure is different and not compatible with traditional Hadoop. <<BR>>
Also, make sure jblas lib is included in Spark as it is needed for Matrix Factorization.

4. Once Hadoop and Spark are up and running and Spark-Bench is installed, try out SQL benchmark
4.1 Run the gen-data.sh script

$ <SPARK_BENCH_HOME>/SQL/bin/gen_data.sh

Check if sample data sets are created in /SparkBench/sql/Input in HDFS.
If not, then there is a bug in spark bench scripts that needs to be fixed using the following steps: <<BR>>
- Open <SPARK_BENCH_HOME>/bin/funcs.sh and search for function 'CPFROM' <<BR>>
- In the last else block, replace the two occurences of ${src} variable with this: ${src:8}
- This problem was spotted by a colleague at AMD and he has submitted the patch here: https://github.com/SparkTC/spark-bench/pull/34 <<BR>>
- After making these changes, try running gen_data.sh script again and check if input data is created in HDFS this time. Then proceed to the next step.

4.2 Run the run.sh script

$ <SPARK_BENCH_HOME>/SQL/bin/run.sh

After running this, check <SPARK_BENCH_HOME>/num/bench-report.dat to see if the run was successful.