Skip to end of banner
Go to start of banner

Spark Bench

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Current »

Build and Install

1. git clone https://github.com/SparkTC/spark-bench.git

2. Follow steps in Getting Started guide on the following link to install and configure spark-bench: https://github.com/SparkTC/spark-bench/

3. Make sure Hadoop and Spark are installed and running. Note: ODPi Hadoop won't work as it's file structure is different and not compatible with traditional Hadoop. <<BR>>
Also, make sure jblas lib is included in Spark as it is needed for Matrix Factorization.

4. Once Hadoop and Spark are up and running and Spark-Bench is installed, try out SQL benchmark
4.1 Run the gen-data.sh script

$ <SPARK_BENCH_HOME>/SQL/bin/gen_data.sh


Check if sample data sets are created in /SparkBench/sql/Input in HDFS.
If not, then there is a bug in spark bench scripts that needs to be fixed using the following steps: <<BR>>
- Open <SPARK_BENCH_HOME>/bin/funcs.sh and search for function 'CPFROM' <<BR>>
- In the last else block, replace the two occurences of ${src} variable with this: ${src:8}
- This problem was spotted by a colleague at AMD and he has submitted the patch here: https://github.com/SparkTC/spark-bench/pull/34 <<BR>>
- After making these changes, try running gen_data.sh script again and check if input data is created in HDFS this time. Then proceed to the next step.

4.2 Run the run.sh script

$ <SPARK_BENCH_HOME>/SQL/bin/run.sh


After running this, check <SPARK_BENCH_HOME>/num/bench-report.dat to see if the run was successful.

  • No labels