Table of Contents |
---|
...
We use Apache Spark 2.2.1 to streaming frameworks.
Hadoop Configuration
...
Hadoop is used to generate the input data of the workloads. Create and edit conf/hadoop.conf
:
Code Block | ||
---|---|---|
| ||
cp conf/hadoop.conf.template conf/hadoop.conf |
Set the below properties properly:
Property | Meaning |
---|---|
hibench.hadoop.home | /root/hadoop |
hibench.hadoop.executable | ${hibench.hadoop.home}/bin/hadoop |
hibench.hadoop.configure.dir | ${hibench.hadoop.home}/etc/hadoop |
hibench.hdfs.master | hdfs://wls-arm-huawei01:9000 |
hibench.hadoop.release | apache, cdh5, hdp |
Note: For CDH and HDP users, please update hibench.hadoop.executable
, hibench.hadoop.configure.dir
and hibench.hadoop.release
properly.
The default value is for Apache release.
Kafka Configuration
...
Set the below Kafka properites in conf/hibench.conf
and leave others as default.
Property | Meaning |
---|---|
hibench.streambench.kafka.home | /root/kafka_2.10-0.8.2.2 |
hibench.streambench.zkHost | wls-arm-huawei01:2181 |
hibench.streambench.kafka.brokerList | wls-arm-huawei01:9092 |
hibench.streambench.kafka.topicPartitions | Number of partitions of generated topic (default 20) |
Generate the data
...
Take workload identity
as an example. genSeedDataset.sh
generates the seed data on HDFS. dataGen.sh
sends the data to Kafka.
Code Block | ||
---|---|---|
| ||
bin/workloads/streaming/identity/prepare/genSeedDataset.sh bin/workloads/streaming/identity/prepare/dataGen.sh |
Run the streaming application
...
While the data are being sent to the Kafka, start the streaming application. Take Spark streaming as an example.
Code Block | ||
---|---|---|
| ||
bin/workloads/streaming/identity/spark/run.sh |
...
Generate the report
...
metrics_reader.sh
is used to generate the report.
Code Block | ||
---|---|---|
| ||
bin/workloads/streaming/identity/common/metrics_reader.sh |
...
Debug for running Streaming bench
...
when running the streaming application:
Code Block | ||
---|---|---|
| ||
bin/workloads/streaming/identity/spark/run.sh |
There are some issues as follows:
streaming identity run into ERROR CheckpointWriter: Could not submit checkpoint task to the thread pool executor
...
- Set "conf/spark.conf:hibench.streambench.spark.batchInterval" to a larger value
2. Process of generating data hangs
The solutions:
Modify conf/hibench.conf to make data-generating not in infinity mode:
...
Arch | count | throughput(msgs/s) | max_latency(ms) | mean_latency(ms) | min_latency(ms) | stddev_latency(ms) | p50_latency(ms) | p75_latency(ms) | p95_latency(ms) | p98_latency(ms) | p99_latency(ms) | p999_latency(ms) |
Arm64 | 2015 | 99 | 2339 | 338.531 | 54 | 437.862 | 210 | 307 | 1579 | 1985.32 | 2148 | 2338 |
X86 | 6115 | 100 | 2269 | 162.566 | 25 | 349.292 | 80 | 87 | 974 | 1687.68 | 1817 | 2220 |
Repartition:
Arch | count | throughput(msgs/s) | max_latency(ms) | mean_latency(ms) | min_latency(ms) | stddev_latency(ms) | p50_latency(ms) | p75_latency(ms) | p95_latency(ms) | p98_latency(ms) | p99_latency(ms) | p999_latency(ms) |
Arm64 | 2160 | 99 | 2896 | 458.714 | 89 | 573.476 | 257 | 351 | 1961.55 | 2447.12 | 2678.02 | 2885.433 |
X86 | 2930 | 99 | 2720 | 569.22 | 73 | 828.736vi | 132 | 503 | 2480.45 | 2549.38 | 2573.07 | 2674.243 |
Wordcount:
Arch | count | throughput(msgs/s) | max_latency(ms) | mean_latency(ms) | min_latency(ms) | stddev_latency(ms) | p50_latency(ms) | p75_latency(ms) | p95_latency(ms) | p98_latency(ms) | p99_latency(ms) | p999_latency(ms) |
Arm64 | 4145 | 99 | 3495 | 967.401 | 48 | 844.909 | 743 | 1482.5 | 2695.7 | 3005.36 | 3134.54 | 3445.854 |
X86 | 5674 | 92 | 4825 | 3879.146 | 2534 | 428.128 | 3901 | 4171.25 | 4586 | 4682 | 4731.5 | 4818 |
...