BPF Compiler Collection (BCC)—— A profiling Tool from Kernel Perspective
1.Overview
BPF (Berkeley Packet Filter) were added to the Linux 4.x series kernels, allowing BPF to do much more than just filtering packets. These enhancements allow custom analysis programs to be executed on Linux dynamic tracing, static tracing, and profiling events. Linux Enhanced BPF (eBPF) has raw tracing capabilities similar to those of DTrace and SystemTap. eBPF can also be used for security modules and software defined networks.
Also note: eBPF is often called just "BPF", especially on lkml.
BPF originated as a technology for optimizing packet filters. If you run tcpdump with an expression (matching on a host or port), it gets compiled into optimal BPF bytecode which is executed by an in-kernel sandboxed virtual machine. Enhanced BPF (aka eBPF, but also just BPF) extended what this BPF virtual machine could do: allowing it to run on events other than packets, and do actions other than filtering.
eBPF can be used to for software defined networks, DDoS mitigation (early packet drop), improving network performance (eXpress Data Path), intrusion detection, and more. On this page I'm focusing on its use for observability tools, where it is used as shown in the following workflow:
BCC is a toolkit for creating efficient kernel tracing and manipulation programs, and includes several useful tools and examples. It makes use of extended BPF (Berkeley Packet Filters), formally known as eBPF, a new feature that was first added to Linux 3.15. Much of what BCC uses requires Linux 4.1 and above. It provides a large collection of tracing examples tools, as well as Python and Lua interfaces for developing them.
2. Enable BCC on Arm64
See the bcc install instructions for getting started on different Linux distros.
But some difference on Arm64 platform. Here are Ubuntu instructions, as an example:
2.1 Prerequisites
Rebuild Kernel with the following flags set;
CONFIG_BPF=y CONFIG_BPF_SYSCALL=y # [optional, for tc filters] CONFIG_NET_CLS_BPF=m # [optional, for tc actions] CONFIG_NET_ACT_BPF=m CONFIG_BPF_JIT=y CONFIG_HAVE_BPF_JIT=y # [optional, for kprobes] CONFIG_BPF_EVENTS=y CONFIG_NET_SCH_SFQ=m CONFIG_NET_ACT_POLICE=m CONFIG_NET_ACT_GACT=m CONFIG_DUMMY=m CONFIG_VXLAN=m
2.2 LLvm Arm64 rebuild
# get source and build git clone https://git.llvm.org/git/llvm.git cd llvm/tools git clone https://git.llvm.org/git/clang.git cd llvm/projects git clone https://git.llvm.org/git/compiler-rt.git cd llvm/projects git clone https://git.llvm.org/git/openmp.git cd llvm/projects git clone https://git.llvm.org/git/libcxx.git git clone https://git.llvm.org/git/libcxxabi.git cd llvm/projects git clone https://git.llvm.org/git/test-suite.git # Build mkdir build cd build cmake .. -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD="ARM;X86;AArch64;BPF" make make check-all sudo make install sudo ldconfig
2.3 Rebuild luaJIT for Arm64
# upstream source do not support Arm64; # So get Arm support source from: http://luajit.org/download.html wget http://luajit.org/download/LuaJIT-2.1.0-beta3.tar.gz # Build Arm64 Lua tar zxvf LuaJIT-2.1.0-beta3.tar.gz make && make install
2.4 Install BCC on Arm64
git clone https://github.com/iovisor/bcc.git mkdir build cd build cmake .. -DCMAKE_INSTALL_PREFIX=/usr make sudo make install sudo ldconfig
3. BCC Tools Commands
These tools include execsnoop, opensnoop, ext4slower, biolatency, tcpconnect, oomkill, runqlat, etc.
They are easy to learn and use:
execsnoop
Trace new processes via exec() syscalls, and print the parent process name and other details:
# execsnoop PCOMM PID RET ARGS bash 15887 0 /usr/bin/man ls preconv 15894 0 /usr/bin/preconv -e UTF-8 man 15896 0 /usr/bin/tbl man 15897 0 /usr/bin/nroff -mandoc -rLL=169n -rLT=169n -Tutf8 man 15898 0 /usr/bin/pager -s nroff 15900 0 /usr/bin/locale charmap nroff 15901 0 /usr/bin/groff -mtty-char -Tutf8 -mandoc -rLL=169n -rLT=169n groff 15902 0 /usr/bin/troff -mtty-char -mandoc -rLL=169n -rLT=169n -Tutf8 groff 15903 0 /usr/bin/grotty [...]
opensnoop
Trace open() syscalls and print process name and path name details:
# opensnoop PID COMM FD ERR PATH 27159 catalina.sh 3 0 /apps/tomcat8/bin/setclasspath.sh 4057 redis-server 5 0 /proc/4057/stat 2360 redis-server 5 0 /proc/2360/stat 30668 sshd 4 0 /proc/sys/kernel/ngroups_max 30668 sshd 4 0 /etc/group 30668 sshd 4 0 /root/.ssh/authorized_keys 30668 sshd 4 0 /root/.ssh/authorized_keys 30668 sshd -1 2 /var/run/nologin 30668 sshd -1 2 /etc/nologin 30668 sshd 4 0 /etc/login.defs 30668 sshd 4 0 /etc/passwd 30668 sshd 4 0 /etc/shadow 30668 sshd 4 0 /etc/localtime 4510 snmp-pass 4 0 /proc/cpuinfo [...]
- ext4slower
Trace slow ext4 operations that are slower than a provided threshold (bcc has versions of this for btrfs, XFS, and ZFS as well):
# ext4slower 1 Tracing ext4 operations slower than 1 ms TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME 06:49:17 bash 3616 R 128 0 7.75 cksum 06:49:17 cksum 3616 R 39552 0 1.34 [ 06:49:17 cksum 3616 R 96 0 5.36 2to3-2.7 06:49:17 cksum 3616 R 96 0 14.94 2to3-3.4 06:49:17 cksum 3616 R 10320 0 6.82 411toppm 06:49:17 cksum 3616 R 65536 0 4.01 a2p 06:49:17 cksum 3616 R 55400 0 8.77 ab 06:49:17 cksum 3616 R 36792 0 16.34 aclocal-1.14
biolatency
Summarize block device I/O latency as a histogram every second:
# biolatency -mT 1 Tracing block device I/O... Hit Ctrl-C to end. 21:33:40 msecs : count distribution 0 -> 1 : 69 |****************************************| 2 -> 3 : 16 |********* | 4 -> 7 : 6 |*** | 8 -> 15 : 21 |************ | 16 -> 31 : 16 |********* | 32 -> 63 : 5 |** | 64 -> 127 : 1 | | 21:33:41 msecs : count distribution 0 -> 1 : 60 |************************ | 2 -> 3 : 100 |****************************************| 4 -> 7 : 41 |**************** | 8 -> 15 : 11 |**** | 16 -> 31 : 9 |*** | 32 -> 63 : 6 |** | 64 -> 127 : 4 |* | [...]
biosnoop
Trace block device I/O with process, disk, and latency details:
# biosnoop TIME(s) COMM PID DISK T SECTOR BYTES LAT(ms) 0.000004001 supervise 1950 xvda1 W 13092560 4096 0.74 0.000178002 supervise 1950 xvda1 W 13092432 4096 0.61 0.001469001 supervise 1956 xvda1 W 13092440 4096 1.24 0.001588002 supervise 1956 xvda1 W 13115128 4096 1.09 1.022346001 supervise 1950 xvda1 W 13115272 4096 0.98 1.022568002 supervise 1950 xvda1 W 13188496 4096 0.93 1.023534000 supervise 1956 xvda1 W 13188520 4096 0.79 1.023585003 supervise 1956 xvda1 W 13189512 4096 0.60
cachestat
Show the page cache hit/miss ratio and size, and summarize every second:
# cachestat HITS MISSES DIRTIES READ_HIT% WRITE_HIT% BUFFERS_MB CACHED_MB 170610 41607 33 80.4% 19.6% 11 288 157693 6149 33 96.2% 3.7% 11 311 174483 20166 26 89.6% 10.4% 12 389 434778 35 40 100.0% 0.0% 12 389 [...]
tcpconnect
Trace TCP active connections (connect()):
# tcpconnect PID COMM IP SADDR DADDR DPORT 25333 recordProgra 4 127.0.0.1 127.0.0.1 28527 25338 curl 4 100.66.3.172 52.22.109.254 80 25340 curl 4 100.66.3.172 31.13.73.36 80 25342 curl 4 100.66.3.172 104.20.25.153 80 25220 ssh 6 fe80::8a3:9dff:fed5:6b19 fe80::8a3:9dff:fed5:6b19 22 [...]
tcpaccept
Trace TCP passive connections (accept()):
# tcpaccept PID COMM IP RADDR LADDR LPORT 2287 sshd 4 11.16.213.254 100.66.3.172 22 4057 redis-server 4 127.0.0.1 127.0.0.1 28527 4057 redis-server 4 127.0.0.1 127.0.0.1 28527 4057 redis-server 4 127.0.0.1 127.0.0.1 28527 2287 sshd 6 fe80::8a3:9dff:fed5:6b19 fe80::8a3:9dff:fed5:6b19 22 4057 redis-server 4 127.0.0.1 127.0.0.1 28527 [...]
tcpretrans
Trace TCP retransmits and TLPs:
# tcpretrans TIME PID IP LADDR:LPORT T> RADDR:RPORT STATE 01:55:05 0 4 10.153.223.157:22 R> 69.53.245.40:34619 ESTABLISHED 01:55:05 0 4 10.153.223.157:22 R> 69.53.245.40:34619 ESTABLISHED 01:55:17 0 4 10.153.223.157:22 R> 69.53.245.40:22957 ESTABLISHED [...]
gethostlatency
Show latency for getaddrinfo/gethostbyname[2] library calls, system wide:
# gethostlatency TIME PID COMM LATms HOST 06:10:24 28011 wget 90.00 www.iovisor.org 06:10:28 28127 wget 0.00 www.iovisor.org 06:11:25 29404 curl 72.00 foo 06:11:28 29475 curl 1.00 foo [...]
runqlat
Show run queue (scheduler) latency as a histogram, every 5 seconds:
# runqlat -m 5 Tracing run queue latency... Hit Ctrl-C to end. msecs : count distribution 0 -> 1 : 2085 |****************************************| 2 -> 3 : 8 | | 4 -> 7 : 20 | | 8 -> 15 : 191 |*** | 16 -> 31 : 420 |******** | msecs : count distribution 0 -> 1 : 1798 |****************************************| 2 -> 3 : 11 | | 4 -> 7 : 45 |* | 8 -> 15 : 441 |********* | 16 -> 31 : 1030 |********************** | [...]
profile
Sample stack traces at 49 Hertz, then print unique stacks with the number of occurrences seen:
# profile Sampling at 49 Hertz of all threads by user + kernel stack... Hit Ctrl-C to end. ^C [...] ffffffff811a2eb0 find_get_entry ffffffff811a338d pagecache_get_page ffffffff811a51fa generic_file_read_iter ffffffff81231f30 __vfs_read ffffffff81233063 vfs_read ffffffff81234565 SyS_read ffffffff818739bb entry_SYSCALL_64_fastpath 00007f4757ff9680 read - dd (14283) 29 [...]