Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

...

We use Apache Spark 2.2.1 to streaming frameworks.


Hadoop Configuration

...

Hadoop is used to generate the input data of the workloads. Create and edit conf/hadoop.conf

Code Block
languagecpp
cp conf/hadoop.conf.template conf/hadoop.conf

Set the below properties properly:

PropertyMeaning
hibench.hadoop.home/root/hadoop
hibench.hadoop.executable${hibench.hadoop.home}/bin/hadoop
hibench.hadoop.configure.dir${hibench.hadoop.home}/etc/hadoop
hibench.hdfs.masterhdfs://wls-arm-huawei01:9000
hibench.hadoop.releaseapache, cdh5, hdp

Note: For CDH and HDP users, please update hibench.hadoop.executablehibench.hadoop.configure.dir and hibench.hadoop.release properly.

The default value is for Apache release.


Kafka Configuration

...

Set the below Kafka properites in conf/hibench.conf and leave others as default.

PropertyMeaning
hibench.streambench.kafka.home/root/kafka_2.10-0.8.2.2
hibench.streambench.zkHostwls-arm-huawei01:2181
hibench.streambench.kafka.brokerListwls-arm-huawei01:9092
hibench.streambench.kafka.topicPartitionsNumber of partitions of generated topic (default 20)


Generate the data

...

Take workload identity as an example. genSeedDataset.sh generates the seed data on HDFS. dataGen.sh sends the data to Kafka.

Code Block
languagebash
bin/workloads/streaming/identity/prepare/genSeedDataset.sh
bin/workloads/streaming/identity/prepare/dataGen.sh

Run the streaming application

...

While the data are being sent to the Kafka, start the streaming application. Take Spark streaming as an example.

Code Block
languagebash
bin/workloads/streaming/identity/spark/run.sh

...

Generate the report

...

metrics_reader.sh is used to generate the report.

Code Block
languagebash
bin/workloads/streaming/identity/common/metrics_reader.sh

...

Debug for running Streaming bench 

...

when running the streaming application: 

Code Block
languagebash
bin/workloads/streaming/identity/spark/run.sh


There are some issues as follows:

  1.  streaming identity run into ERROR CheckpointWriter: Could not submit checkpoint task to the thread pool executor

...

  • Set "conf/spark.conf:hibench.streambench.spark.batchInterval" to a larger value


2.   Process of generating data hangs

The solutions:

Modify conf/hibench.conf  to make data-generating not in infinity mode:

...

Archcountthroughput(msgs/s)max_latency(ms)mean_latency(ms)min_latency(ms)stddev_latency(ms)p50_latency(ms)p75_latency(ms)p95_latency(ms)p98_latency(ms)p99_latency(ms)p999_latency(ms)
Arm642015992339338.53154437.86221030715791985.3221482338
X8661151002269162.56625349.29280879741687.6818172220


Repartition:

Archcountthroughput(msgs/s)max_latency(ms)mean_latency(ms)min_latency(ms)stddev_latency(ms)p50_latency(ms)p75_latency(ms)p95_latency(ms)p98_latency(ms)p99_latency(ms)p999_latency(ms)
Arm642160992896458.71489573.4762573511961.552447.122678.022885.433
X862930992720569.2273828.736vi1325032480.452549.382573.072674.243


Wordcount:

Archcountthroughput(msgs/s)max_latency(ms)mean_latency(ms)min_latency(ms)stddev_latency(ms)p50_latency(ms)p75_latency(ms)p95_latency(ms)p98_latency(ms)p99_latency(ms)p999_latency(ms)
Arm644145993495967.40148844.9097431482.52695.73005.363134.543445.854
X8656749248253879.1462534428.12839014171.25458646824731.54818

...