HiveTestBench

About Hive TestBench

The information available on [[https://github.com/hortonworks/hive-testbench|Hive TestBench]] web page.

Compilation and Run of Hive TestBench

Create a local clone

Clone the Hive TestBench

$ git clone https://github.com/hortonworks/hive-testbench.git

Compile the benchmark

Compile the benchmark using the scripts

$ ./tpcds-build.sh 
$ ./tpch-build.sh

Choose the scale of data on which to run the tests Under 4 nodes, less than 100-250GB, 4-10 nodes should handle 1TB, Large clusters can scale up to higher values

Prepare data for the scale

As an example

$ ./tpcds-setup.sh 1000 
$ ./tpch-setup.sh 1000

By default this creates tables in ORC format, you can choose a different format by setting

FORMAT=<SEQUENCEFILE, TEXTFILE, RCFILE, ORC, PARQUET, or AVRO>
For example; FORMAT=rcfile ./tpcds-setup.sh 30000

Run queries

You can use hive, beeline or any SQL tool

TPC-DS

cd sample-queries-tpcds 
hive -i testbench.settings 
hive> use tpcds_bin_partitioned_orc_1000; 
hive> source query55.sql;

TPC-H

cd sample-queries-tpch
hive -i testbench.settings
hive> use tpch_flat_orc_1000;
hive> source tpch_query1.sql;

Big Data & Data Science