Statsd-jvm-profiler with Hibench

Statsd-jvm-profiler is a JVM agent profiler that sends profiling data to StatsD/InfluxDB. It was primarily built for profiling Hadoop/Spark with any JVM process.

1. Prerequisites

1.1 Install influxDB

a. Install go-lan

apt-get install golang-1.9

b. get source

mkdir $HOME/gocodez
export GOPATH=$HOME/gocodez
go get github.com/influxdata/influxdb

c. Build

cd $GOPATH/src/github.com/influxdata/influxdb
gdm restore
go clean ./...
go install ./...

d. Start influxDB

$GOPATH/bin/influxd

# Create DataBase user / password ( profiler / profiler)
influx -precision rfc3339
CREATE DATABASE profiler
CREATE USER profiler WITH PASSWORD 'profiler' WITH ALL PRIVILEGES

1.2 Install Statsd-jvm-profiler Dependencies

sudo easy_install pip
pip install influxdb
pip install blist

2 Installation and Configuration

We need the statsd-jvm-profiler JAR on the machine where the JVM will be running.

The JAR can be built with mvn package. We need a relatively recent Maven (at least Maven 3).

statsd-jvm-profiler is available in Maven Central:

<dependency>
  <groupId>com.etsy</groupId>
  <artifactId>statsd-jvm-profiler</artifactId>
  <version>2.0.0</version>
</dependency>

2.1 Get source and Build Statsd-jvm-profiler

git clone https://github.com/etsy/statsd-jvm-profiler


# Build and Skip its Unit Tests
mvn package -DskipTests

Deploy the jar to the machines which are running the executor processes. One way to do this is to using spark-submit’s –jars attribute, which will deploy it to the executor.

--jars /path/to/statsd-jvm-profiler-2.1.1-jar-with-dependencies.jar

2.2 Set Additional options:

spark.executor.extraJavaOptions=-javaagent:/root/statsd-jvm-profiler-2.1.1-SNAPSHOT-jar-with-dependencies.jar=server=wls-arm-cavium02.shanghai.arm.com,port=8086,reporter=InfluxDBReporter,database=profiler,username=profiler,password=profiler,prefix=yuqi.XXX,tagMapping=XXX.test"

statsd-jvm-profiler-2.1.1-SNAPSHOT-jar-with-dependencies.jar：Built from mvn package
server/port: Cluster host name, port is default to 8086
reporter: sends profiling data to StatsD or InfluxDB
prefix: prefix offset in influxDB
tagMapping: Tag name in influxDB

Add additional options to Hibench script
For Hibench-sleep example:

diff --git a/bin/functions/workload_functions.sh b/bin/functions/workload_functions.sh
index 2127f3e..2c12a8a 100644
--- a/bin/functions/workload_functions.sh
+++ b/bin/functions/workload_functions.sh
@@ -198,6 +198,8 @@ function run_spark_job() {
 
     export_withlog SPARKBENCH_PROPERTIES_FILES
 
+    HI_PROP_OPTS="--conf spark.executor.extraJavaOptions=-javaagent:/root/statsd-jvm-profiler-2.1.1-SNAPSHOT-jar-with-dependencies.jar=server=wls-arm-cavium02.shanghai.arm.com,port=8086,reporter=InfluxDBReporter,database=profiler,username=profiler,password=profiler,prefix=yuqi.sleep,tagMapping=sleep.test"
+
     YARN_OPTS=""
     if [[ "$SPARK_MASTER" == yarn-* ]]; then
         export_withlog HADOOP_CONF_DIR
@@ -215,9 +217,9 @@ function run_spark_job() {
     fi
     if [[ "$CLS" == *.py ]]; then
         LIB_JARS="$LIB_JARS --jars ${SPARKBENCH_JAR}"
-        SUBMIT_CMD="${SPARK_HOME}/bin/spark-submit ${LIB_JARS} --properties-file ${SPARK_PROP_CONF} --master ${SPARK_MASTER} ${YARN_OPTS} ${CLS} $@"
+        SUBMIT_CMD="${SPARK_HOME}/bin/spark-submit ${LIB_JARS} --properties-file ${SPARK_PROP_CONF} --master ${SPARK_MASTER} ${YARN_OPTS} ${HI_PROP_OPTS}  ${CLS} $@"
     else
-        SUBMIT_CMD="${SPARK_HOME}/bin/spark-submit ${LIB_JARS} --properties-file ${SPARK_PROP_CONF} --class ${CLS} --master ${SPARK_MASTER} ${YARN_OPTS} ${SPARKBENCH_JAR} $@"
+        SUBMIT_CMD="${SPARK_HOME}/bin/spark-submit ${LIB_JARS} --properties-file ${SPARK_PROP_CONF} --class ${CLS} --master ${SPARK_MASTER} ${YARN_OPTS} ${HI_PROP_OPTS} ${SPARKBENCH_JAR} $@"
     fi

3. Profiling results

3.1 Get Stack Dump from InfluxDB

Get dump tool:

https://github.com/etsy/statsd-jvm-profiler/blob/master/visualization/influxdb_dump.py

For Hibench-sleep example,sleep stack is the output file:

influxdb_dump.py -o "wls-arm-cavium02.shanghai.arm.com" -u profiler -p profiler -d profiler -t sleep.test -e yuqi.sleep> sleep.stack

3.2 Generate Flame graph

Get tools:

https://github.com/brendangregg/FlameGraph/blob/master/flamegraph.pl

Generate flame graphs using the text files you dumped from InfluxDB dump file:

flamegraph.pl sleep.stack  > sleep.svg

4. Profiling Analysis of Hibench Flame Graph

Spark Sleep Graph

The original SVG file: sleep.svg

The original SVG file: Sort.svg

Spark Terasort Graph

The original SVG file: Terasort.svg

Spark WordCount Graph

The original SVG file: wordCount.svg

High CPU utilization rank

From above 4 flame graphs:

Rank(%)	Sort	Terasort	wordCount	Sleep
1	sun.nio.ch.EPollArrayWrapper.epollWait 49.66%	sun.nio.ch.EPollArrayWrapper.epollWait 62.85%	sun.nio.ch.EPollArrayWrapper.epollWait 48.17%	io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run ---->sun.nio.ch.EPollArrayWrapper.epollWait 76.71%
2	io.netty.util.concurrent.SingleThreadEventExecutor$2.run --→sun.nio.ch.EPollArrayWrapper.epollWait 14.8%	io.netty.util.concurrent.SingleThreadEventExecutor$2.run --→sun.nio.ch.EPollArrayWrapper.epollWait 12.53%	io.netty.util.concurrent.SingleThreadEventExecutor$2.run --→sun.nio.ch.EPollArrayWrapper.epollWait 19.78%	io.netty.util.concurrent.SingleThreadEventExecutor$2.run ---->sun.nio.ch.EPollArrayWrapper.epollWait 16.52%
3	org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write 13.34%	org.apache.spark.scheduler.ResultTask.runTask 13.67%	org.apache.spark.executor.CoarseGrainedExecutorBackend$.main 9.35%	java.util.concurrent.ThreadPoolExecutor.runWorker 0.1%
4	org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0 13.24%	org.apache.hadoop.ipc.ProtobufRpcEngine.getProxy 5.15%	org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1 5.56%	com.squareup.okhttp.ConnectionPool.performCleanup 0.05%
5	org.apache.spark.rdd.RDD.iterator 5.26%	org.apache.spark.executor.CoarseGrainedExecutorBackend.main 4.48%	org.apache.spark.scheduler.ShuffleMapTask.runTask 5.55%	org.apache.spark.rpc.netty.Inbox.process 0.05%
6	org.apache.spark.serializer.KryoSerializer.newKryo 3.51%	org.apache.spark.SparkEnv$.createExecutorEnv 3.82	org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0 4.24%	org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp 0.05%
7	org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy 3.02%	org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0 3.58%	org.apache.spark.deploy.SparkHadoopUtil.<init> 3.35%	java.net.URLClassLoader.findClass 0.05%
8	com.twitter.chill.AllScalaRegistrar.apply 3.02%	sun.reflect.NativeConstructorAccessorImpl.newInstance0 2.59%	org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser 2.65%	sun.misc.URLClassPath$JarLoader.getResource 0.05%
9	org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp 2.14%	sun.misc.ProxyGenerator.generateClassFile 2.41%	org.apache.spark.util.collection.ExternalSorter.insertAll 1.66%	java.util.jar.JarFile.getJarEntry 0.05%
10	com.esotericsoftware.kryo.Kryo.<init> 2.04%	org.apache.spark.metrics.sink.MetricsServlet.<init> 1.49%	org.apache.spark.scheduler.ShuffleMapTask.runTask 1.16%	java.util.zip.ZipFile.getEntry 0.05%

Big Data & Data Science

Spark profiling on Arm64