HiveTestBench
About Hive TestBench
The information available on [[https://github.com/hortonworks/hive-testbench|Hive TestBench]] web page.
Compilation and Run of Hive TestBench
Create a local clone
Clone the Hive TestBench
$ git clone https://github.com/hortonworks/hive-testbench.git
Compile the benchmark
Compile the benchmark using the scripts
$ ./tpcds-build.sh $ ./tpch-build.sh
Choose the scale of data on which to run the tests Under 4 nodes, less than 100-250GB, 4-10 nodes should handle 1TB, Large clusters can scale up to higher values
Prepare data for the scale
As an example
$ ./tpcds-setup.sh 1000 $ ./tpch-setup.sh 1000
By default this creates tables in ORC format, you can choose a different format by setting
FORMAT=<SEQUENCEFILE, TEXTFILE, RCFILE, ORC, PARQUET, or AVRO> For example; FORMAT=rcfile ./tpcds-setup.sh 30000
Run queries
You can use hive, beeline or any SQL tool
TPC-DS
cd sample-queries-tpcds hive -i testbench.settings hive> use tpcds_bin_partitioned_orc_1000; hive> source query55.sql;
TPC-H
cd sample-queries-tpch hive -i testbench.settings hive> use tpch_flat_orc_1000; hive> source tpch_query1.sql;