BPF Compiler Collection (BCC)—— A profiling Tool from Kernel Perspective
1.Overview
BPF (Berkeley Packet Filter) were added to the Linux 4.x series kernels, allowing BPF to do much more than just filtering packets. These enhancements allow custom analysis programs to be executed on Linux dynamic tracing, static tracing, and profiling events. Linux Enhanced BPF (eBPF) has raw tracing capabilities similar to those of DTrace and SystemTap. eBPF can also be used for security modules and software defined networks.
Also note: eBPF is often called just "BPF", especially on lkml.
BPF originated as a technology for optimizing packet filters. If you run tcpdump with an expression (matching on a host or port), it gets compiled into optimal BPF bytecode which is executed by an in-kernel sandboxed virtual machine. Enhanced BPF (aka eBPF, but also just BPF) extended what this BPF virtual machine could do: allowing it to run on events other than packets, and do actions other than filtering.
eBPF can be used to for software defined networks, DDoS mitigation (early packet drop), improving network performance (eXpress Data Path), intrusion detection, and more. On this page I'm focusing on its use for observability tools, where it is used as shown in the following workflow:
BCC is a toolkit for creating efficient kernel tracing and manipulation programs, and includes several useful tools and examples. It makes use of extended BPF (Berkeley Packet Filters), formally known as eBPF, a new feature that was first added to Linux 3.15. Much of what BCC uses requires Linux 4.1 and above. It provides a large collection of tracing examples tools, as well as Python and Lua interfaces for developing them.
2. Enable BCC on Arm64
See the bcc install instructions for getting started on different Linux distros.
But some difference on Arm64 platform. Here are Ubuntu instructions, as an example:
2.1 Prerequisites
Rebuild Kernel with the following flags set;
CONFIG_BPF=y
CONFIG_BPF_SYSCALL=y
# [optional, for tc filters]
CONFIG_NET_CLS_BPF=m
# [optional, for tc actions]
CONFIG_NET_ACT_BPF=m
CONFIG_BPF_JIT=y
CONFIG_HAVE_BPF_JIT=y
# [optional, for kprobes]
CONFIG_BPF_EVENTS=y
CONFIG_NET_SCH_SFQ=m
CONFIG_NET_ACT_POLICE=m
CONFIG_NET_ACT_GACT=m
CONFIG_DUMMY=m
CONFIG_VXLAN=m2.2 LLvm Arm64 rebuild
# get source and build
git clone https://git.llvm.org/git/llvm.git
cd llvm/tools
git clone https://git.llvm.org/git/clang.git
cd llvm/projects
git clone https://git.llvm.org/git/compiler-rt.git
cd llvm/projects
git clone https://git.llvm.org/git/openmp.git
cd llvm/projects
git clone https://git.llvm.org/git/libcxx.git
git clone https://git.llvm.org/git/libcxxabi.git
cd llvm/projects
git clone https://git.llvm.org/git/test-suite.git
# Build
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD="ARM;X86;AArch64;BPF"
make
make check-all
sudo make install
sudo ldconfig2.3 Rebuild luaJIT for Arm64
# upstream source do not support Arm64;
# So get Arm support source from: http://luajit.org/download.html
wget http://luajit.org/download/LuaJIT-2.1.0-beta3.tar.gz
# Build Arm64 Lua
tar zxvf LuaJIT-2.1.0-beta3.tar.gz
make && make install
2.4 Install BCC on Arm64
git clone https://github.com/iovisor/bcc.git
mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=/usr
make
sudo make install
sudo ldconfig3. BCC Tools Commands
These tools include execsnoop, opensnoop, ext4slower, biolatency, tcpconnect, oomkill, runqlat, etc.
They are easy to learn and use:
execsnoop
Trace new processes via exec() syscalls, and print the parent process name and other details:
# execsnoop
PCOMM PID RET ARGS
bash 15887 0 /usr/bin/man ls
preconv 15894 0 /usr/bin/preconv -e UTF-8
man 15896 0 /usr/bin/tbl
man 15897 0 /usr/bin/nroff -mandoc -rLL=169n -rLT=169n -Tutf8
man 15898 0 /usr/bin/pager -s
nroff 15900 0 /usr/bin/locale charmap
nroff 15901 0 /usr/bin/groff -mtty-char -Tutf8 -mandoc -rLL=169n -rLT=169n
groff 15902 0 /usr/bin/troff -mtty-char -mandoc -rLL=169n -rLT=169n -Tutf8
groff 15903 0 /usr/bin/grotty
[...]opensnoop
Trace open() syscalls and print process name and path name details:
# opensnoop
PID COMM FD ERR PATH
27159 catalina.sh 3 0 /apps/tomcat8/bin/setclasspath.sh
4057 redis-server 5 0 /proc/4057/stat
2360 redis-server 5 0 /proc/2360/stat
30668 sshd 4 0 /proc/sys/kernel/ngroups_max
30668 sshd 4 0 /etc/group
30668 sshd 4 0 /root/.ssh/authorized_keys
30668 sshd 4 0 /root/.ssh/authorized_keys
30668 sshd -1 2 /var/run/nologin
30668 sshd -1 2 /etc/nologin
30668 sshd 4 0 /etc/login.defs
30668 sshd 4 0 /etc/passwd
30668 sshd 4 0 /etc/shadow
30668 sshd 4 0 /etc/localtime
4510 snmp-pass 4 0 /proc/cpuinfo
[...]ext4slower
Trace slow ext4 operations that are slower than a provided threshold (bcc has versions of this for btrfs, XFS, and ZFS as well):
# ext4slower 1
Tracing ext4 operations slower than 1 ms
TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME
06:49:17 bash 3616 R 128 0 7.75 cksum
06:49:17 cksum 3616 R 39552 0 1.34 [
06:49:17 cksum 3616 R 96 0 5.36 2to3-2.7
06:49:17 cksum 3616 R 96 0 14.94 2to3-3.4
06:49:17 cksum 3616 R 10320 0 6.82 411toppm
06:49:17 cksum 3616 R 65536 0 4.01 a2p
06:49:17 cksum 3616 R 55400 0 8.77 ab
06:49:17 cksum 3616 R 36792 0 16.34 aclocal-1.14biolatency
Summarize block device I/O latency as a histogram every second:
# biolatency -mT 1
Tracing block device I/O... Hit Ctrl-C to end.
21:33:40
msecs : count distribution
0 -> 1 : 69 |****************************************|
2 -> 3 : 16 |********* |
4 -> 7 : 6 |*** |
8 -> 15 : 21 |************ |
16 -> 31 : 16 |********* |
32 -> 63 : 5 |** |
64 -> 127 : 1 | |
21:33:41
msecs : count distribution
0 -> 1 : 60 |************************ |
2 -> 3 : 100 |****************************************|
4 -> 7 : 41 |**************** |
8 -> 15 : 11 |**** |
16 -> 31 : 9 |*** |
32 -> 63 : 6 |** |
64 -> 127 : 4 |* |
[...]biosnoop
Trace block device I/O with process, disk, and latency details:
# biosnoop
TIME(s) COMM PID DISK T SECTOR BYTES LAT(ms)
0.000004001 supervise 1950 xvda1 W 13092560 4096 0.74
0.000178002 supervise 1950 xvda1 W 13092432 4096 0.61
0.001469001 supervise 1956 xvda1 W 13092440 4096 1.24
0.001588002 supervise 1956 xvda1 W 13115128 4096 1.09
1.022346001 supervise 1950 xvda1 W 13115272 4096 0.98
1.022568002 supervise 1950 xvda1 W 13188496 4096 0.93
1.023534000 supervise 1956 xvda1 W 13188520 4096 0.79
1.023585003 supervise 1956 xvda1 W 13189512 4096 0.60cachestat
Show the page cache hit/miss ratio and size, and summarize every second:
# cachestat
HITS MISSES DIRTIES READ_HIT% WRITE_HIT% BUFFERS_MB CACHED_MB
170610 41607 33 80.4% 19.6% 11 288
157693 6149 33 96.2% 3.7% 11 311
174483 20166 26 89.6% 10.4% 12 389
434778 35 40 100.0% 0.0% 12 389
[...]tcpconnect
Trace TCP active connections (connect()):
# tcpconnect
PID COMM IP SADDR DADDR DPORT
25333 recordProgra 4 127.0.0.1 127.0.0.1 28527
25338 curl 4 100.66.3.172 52.22.109.254 80
25340 curl 4 100.66.3.172 31.13.73.36 80
25342 curl 4 100.66.3.172 104.20.25.153 80
25220 ssh 6 fe80::8a3:9dff:fed5:6b19 fe80::8a3:9dff:fed5:6b19 22
[...]tcpaccept
Trace TCP passive connections (accept()):
# tcpaccept
PID COMM IP RADDR LADDR LPORT
2287 sshd 4 11.16.213.254 100.66.3.172 22
4057 redis-server 4 127.0.0.1 127.0.0.1 28527
4057 redis-server 4 127.0.0.1 127.0.0.1 28527
4057 redis-server 4 127.0.0.1 127.0.0.1 28527
2287 sshd 6 fe80::8a3:9dff:fed5:6b19 fe80::8a3:9dff:fed5:6b19 22
4057 redis-server 4 127.0.0.1 127.0.0.1 28527
[...]tcpretrans
Trace TCP retransmits and TLPs:
# tcpretrans
TIME PID IP LADDR:LPORT T> RADDR:RPORT STATE
01:55:05 0 4 10.153.223.157:22 R> 69.53.245.40:34619 ESTABLISHED
01:55:05 0 4 10.153.223.157:22 R> 69.53.245.40:34619 ESTABLISHED
01:55:17 0 4 10.153.223.157:22 R> 69.53.245.40:22957 ESTABLISHED
[...]gethostlatency
Show latency for getaddrinfo/gethostbyname[2] library calls, system wide:
# gethostlatency
TIME PID COMM LATms HOST
06:10:24 28011 wget 90.00 www.iovisor.org
06:10:28 28127 wget 0.00 www.iovisor.org
06:11:25 29404 curl 72.00 foo
06:11:28 29475 curl 1.00 foo
[...]runqlat
Show run queue (scheduler) latency as a histogram, every 5 seconds:
# runqlat -m 5
Tracing run queue latency... Hit Ctrl-C to end.
msecs : count distribution
0 -> 1 : 2085 |****************************************|
2 -> 3 : 8 | |
4 -> 7 : 20 | |
8 -> 15 : 191 |*** |
16 -> 31 : 420 |******** |
msecs : count distribution
0 -> 1 : 1798 |****************************************|
2 -> 3 : 11 | |
4 -> 7 : 45 |* |
8 -> 15 : 441 |********* |
16 -> 31 : 1030 |********************** |
[...]profile
Sample stack traces at 49 Hertz, then print unique stacks with the number of occurrences seen:
# profile
Sampling at 49 Hertz of all threads by user + kernel stack... Hit Ctrl-C to end.
^C
[...]
ffffffff811a2eb0 find_get_entry
ffffffff811a338d pagecache_get_page
ffffffff811a51fa generic_file_read_iter
ffffffff81231f30 __vfs_read
ffffffff81233063 vfs_read
ffffffff81234565 SyS_read
ffffffff818739bb entry_SYSCALL_64_fastpath
00007f4757ff9680 read
- dd (14283)
29
[...]