Spark profiling on Arm64
Statsd-jvm-profiler with Hibench
Statsd-jvm-profiler is a JVM agent profiler that sends profiling data to StatsD/InfluxDB. It was primarily built for profiling Hadoop/Spark with any JVM process.
1. Prerequisites
1.1 Install influxDB
a. Install go-lan
apt-get install golang-1.9
b. get source
mkdir $HOME/gocodez export GOPATH=$HOME/gocodez go get github.com/influxdata/influxdb
c. Build
cd $GOPATH/src/github.com/influxdata/influxdb gdm restore go clean ./... go install ./...
d. Start influxDB
$GOPATH/bin/influxd # Create DataBase user / password ( profiler / profiler) influx -precision rfc3339 CREATE DATABASE profiler CREATE USER profiler WITH PASSWORD 'profiler' WITH ALL PRIVILEGES
1.2 Install Statsd-jvm-profiler Dependencies
sudo easy_install pip pip install influxdb pip install blist
2 Installation and Configuration
We need the statsd-jvm-profiler JAR on the machine where the JVM will be running.
The JAR can be built with mvn package. We need a relatively recent Maven (at least Maven 3).
statsd-jvm-profiler is available in Maven Central:
<dependency> <groupId>com.etsy</groupId> <artifactId>statsd-jvm-profiler</artifactId> <version>2.0.0</version> </dependency>
2.1 Get source and Build Statsd-jvm-profiler
git clone https://github.com/etsy/statsd-jvm-profiler # Build and Skip its Unit Tests mvn package -DskipTests
Deploy the jar to the machines which are running the executor processes. One way to do this is to using spark-submit’s –jars attribute, which will deploy it to the executor.
--jars /path/to/statsd-jvm-profiler-2.1.1-jar-with-dependencies.jar
2.2 Set Additional options:
spark.executor.extraJavaOptions=-javaagent:/root/statsd-jvm-profiler-2.1.1-SNAPSHOT-jar-with-dependencies.jar=server=wls-arm-cavium02.shanghai.arm.com,port=8086,reporter=InfluxDBReporter,database=profiler,username=profiler,password=profiler,prefix=yuqi.XXX,tagMapping=XXX.test"
- statsd-jvm-profiler-2.1.1-SNAPSHOT-jar-with-dependencies.jar:Built from mvn package
- server/port: Cluster host name, port is default to 8086
- reporter: sends profiling data to StatsD or InfluxDB
- prefix: prefix offset in influxDB
- tagMapping: Tag name in influxDB
Add additional options to Hibench script
For Hibench-sleep example:
diff --git a/bin/functions/workload_functions.sh b/bin/functions/workload_functions.sh index 2127f3e..2c12a8a 100644 --- a/bin/functions/workload_functions.sh +++ b/bin/functions/workload_functions.sh @@ -198,6 +198,8 @@ function run_spark_job() { export_withlog SPARKBENCH_PROPERTIES_FILES + HI_PROP_OPTS="--conf spark.executor.extraJavaOptions=-javaagent:/root/statsd-jvm-profiler-2.1.1-SNAPSHOT-jar-with-dependencies.jar=server=wls-arm-cavium02.shanghai.arm.com,port=8086,reporter=InfluxDBReporter,database=profiler,username=profiler,password=profiler,prefix=yuqi.sleep,tagMapping=sleep.test" + YARN_OPTS="" if [[ "$SPARK_MASTER" == yarn-* ]]; then export_withlog HADOOP_CONF_DIR @@ -215,9 +217,9 @@ function run_spark_job() { fi if [[ "$CLS" == *.py ]]; then LIB_JARS="$LIB_JARS --jars ${SPARKBENCH_JAR}" - SUBMIT_CMD="${SPARK_HOME}/bin/spark-submit ${LIB_JARS} --properties-file ${SPARK_PROP_CONF} --master ${SPARK_MASTER} ${YARN_OPTS} ${CLS} $@" + SUBMIT_CMD="${SPARK_HOME}/bin/spark-submit ${LIB_JARS} --properties-file ${SPARK_PROP_CONF} --master ${SPARK_MASTER} ${YARN_OPTS} ${HI_PROP_OPTS} ${CLS} $@" else - SUBMIT_CMD="${SPARK_HOME}/bin/spark-submit ${LIB_JARS} --properties-file ${SPARK_PROP_CONF} --class ${CLS} --master ${SPARK_MASTER} ${YARN_OPTS} ${SPARKBENCH_JAR} $@" + SUBMIT_CMD="${SPARK_HOME}/bin/spark-submit ${LIB_JARS} --properties-file ${SPARK_PROP_CONF} --class ${CLS} --master ${SPARK_MASTER} ${YARN_OPTS} ${HI_PROP_OPTS} ${SPARKBENCH_JAR} $@" fi
3. Profiling results
3.1 Get Stack Dump from InfluxDB
Get dump tool:
https://github.com/etsy/statsd-jvm-profiler/blob/master/visualization/influxdb_dump.py
For Hibench-sleep example,sleep stack is the output file:
influxdb_dump.py -o "wls-arm-cavium02.shanghai.arm.com" -u profiler -p profiler -d profiler -t sleep.test -e yuqi.sleep> sleep.stack
3.2 Generate Flame graph
Get tools:
https://github.com/brendangregg/FlameGraph/blob/master/flamegraph.pl
Generate flame graphs using the text files you dumped from InfluxDB dump file:
flamegraph.pl sleep.stack > sleep.svg
4. Profiling Analysis of Hibench Flame Graph
Spark Sleep Graph
The original SVG file: sleep.svg
The original SVG file: Sort.svg
Spark Terasort Graph
The original SVG file: Terasort.svg
Spark WordCount Graph
The original SVG file: wordCount.svg
High CPU utilization rank
From above 4 flame graphs:
Rank(%) | Sort | Terasort | wordCount | Sleep |
---|---|---|---|---|
1 | sun.nio.ch.EPollArrayWrapper.epollWait49.66% | sun.nio.ch.EPollArrayWrapper.epollWait62.85% | sun.nio.ch.EPollArrayWrapper.epollWait48.17% | io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run ---->sun.nio.ch.EPollArrayWrapper.epollWait76.71% |
2 | io.netty.util.concurrent.SingleThreadEventExecutor$2.run --→sun.nio.ch.EPollArrayWrapper.epollWait 14.8% | io.netty.util.concurrent.SingleThreadEventExecutor$2.run --→sun.nio.ch.EPollArrayWrapper.epollWait 12.53% | io.netty.util.concurrent.SingleThreadEventExecutor$2.run --→sun.nio.ch.EPollArrayWrapper.epollWait 19.78% | io.netty.util.concurrent.SingleThreadEventExecutor$2.run ---->sun.nio.ch.EPollArrayWrapper.epollWait 16.52% |
3 | org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write 13.34% | org.apache.spark.scheduler.ResultTask.runTask 13.67% | org.apache.spark.executor.CoarseGrainedExecutorBackend$.main | java.util.concurrent.ThreadPoolExecutor.runWorker 0.1% |
4 | org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0 13.24% | org.apache.hadoop.ipc.ProtobufRpcEngine.getProxy | org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1 | com.squareup.okhttp.ConnectionPool.performCleanup 0.05% |
5 | org.apache.spark.rdd.RDD.iterator 5.26% | org.apache.spark.executor.CoarseGrainedExecutorBackend.main | org.apache.spark.scheduler.ShuffleMapTask.runTask | org.apache.spark.rpc.netty.Inbox.process 0.05% |
6 | org.apache.spark.serializer.KryoSerializer.newKryo 3.51% | org.apache.spark.SparkEnv$.createExecutorEnv | org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0 | org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp 0.05% |
7 | org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy 3.02% | org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0 | org.apache.spark.deploy.SparkHadoopUtil.<init> | java.net.URLClassLoader.findClass 0.05% |
8 | com.twitter.chill.AllScalaRegistrar.apply 3.02% | sun.reflect.NativeConstructorAccessorImpl.newInstance0 | org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser | sun.misc.URLClassPath$JarLoader.getResource 0.05% |
9 | org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp 2.14% | sun.misc.ProxyGenerator.generateClassFile | org.apache.spark.util.collection.ExternalSorter.insertAll | java.util.jar.JarFile.getJarEntry 0.05% |
10 | com.esotericsoftware.kryo.Kryo.<init> 2.04% | org.apache.spark.metrics.sink.MetricsServlet.<init> | org.apache.spark.scheduler.ShuffleMapTask.runTask | java.util.zip.ZipFile.getEntry 0.05% |