BPF Compiler Collection (BCC)—— A profiling Tool from Kernel Perspective


1.Overview

BPF (Berkeley Packet Filter) were added to the Linux 4.x series kernels, allowing BPF to do much more than just filtering packets. These enhancements allow custom analysis programs to be executed on Linux dynamic tracing, static tracing, and profiling events. Linux Enhanced BPF (eBPF) has raw tracing capabilities similar to those of DTrace and SystemTap. eBPF can also be used for security modules and software defined networks.

Also note: eBPF is often called just "BPF", especially on lkml.

BPF originated as a technology for optimizing packet filters. If you run tcpdump with an expression (matching on a host or port), it gets compiled into optimal BPF bytecode which is executed by an in-kernel sandboxed virtual machine. Enhanced BPF (aka eBPF, but also just BPF) extended what this BPF virtual machine could do: allowing it to run on events other than packets, and do actions other than filtering.

eBPF can be used to for software defined networks, DDoS mitigation (early packet drop), improving network performance (eXpress Data Path), intrusion detection, and more. On this page I'm focusing on its use for observability tools, where it is used as shown in the following workflow:

BCC  is a toolkit for creating efficient kernel tracing and manipulation programs, and includes several useful tools and examples. It makes use of extended BPF (Berkeley Packet Filters), formally known as eBPF, a new feature that was first added to Linux 3.15. Much of what BCC uses requires Linux 4.1 and above. It provides a large collection of tracing examples tools, as well as Python and Lua interfaces for developing them. 


2.  Enable BCC on Arm64

See the bcc install instructions for getting started on different Linux distros.

But some difference on Arm64 platform. Here are Ubuntu instructions, as an example:

2.1  Prerequisites

Rebuild Kernel with the following flags set;

CONFIG_BPF=y
CONFIG_BPF_SYSCALL=y
# [optional, for tc filters]
CONFIG_NET_CLS_BPF=m
# [optional, for tc actions]
CONFIG_NET_ACT_BPF=m
CONFIG_BPF_JIT=y
CONFIG_HAVE_BPF_JIT=y
# [optional, for kprobes]
CONFIG_BPF_EVENTS=y


CONFIG_NET_SCH_SFQ=m
CONFIG_NET_ACT_POLICE=m
CONFIG_NET_ACT_GACT=m
CONFIG_DUMMY=m
CONFIG_VXLAN=m



2.2  LLvm Arm64 rebuild


# get source and build
git clone https://git.llvm.org/git/llvm.git
cd llvm/tools
git clone https://git.llvm.org/git/clang.git
cd llvm/projects
git clone https://git.llvm.org/git/compiler-rt.git
cd llvm/projects
git clone https://git.llvm.org/git/openmp.git
cd llvm/projects
git clone https://git.llvm.org/git/libcxx.git
git clone https://git.llvm.org/git/libcxxabi.git
cd llvm/projects
git clone https://git.llvm.org/git/test-suite.git


# Build
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD="ARM;X86;AArch64;BPF"
make
make check-all
sudo make install
sudo ldconfig


2.3  Rebuild luaJIT for Arm64

# upstream source do not support Arm64;
# So get Arm support source from: http://luajit.org/download.html
wget http://luajit.org/download/LuaJIT-2.1.0-beta3.tar.gz


# Build Arm64 Lua
tar zxvf LuaJIT-2.1.0-beta3.tar.gz
make && make install


2.4 Install BCC on Arm64

git clone https://github.com/iovisor/bcc.git
mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=/usr
make
sudo make install
sudo ldconfig


3. BCC Tools Commands


These tools include execsnoop, opensnoop, ext4slower, biolatency, tcpconnect, oomkill, runqlat, etc.

They are easy to learn and use: 

  • execsnoop

          Trace new processes via exec() syscalls, and print the parent process name and other details:

# execsnoop
PCOMM            PID    RET ARGS
bash             15887    0 /usr/bin/man ls
preconv          15894    0 /usr/bin/preconv -e UTF-8
man              15896    0 /usr/bin/tbl
man              15897    0 /usr/bin/nroff -mandoc -rLL=169n -rLT=169n -Tutf8
man              15898    0 /usr/bin/pager -s
nroff            15900    0 /usr/bin/locale charmap
nroff            15901    0 /usr/bin/groff -mtty-char -Tutf8 -mandoc -rLL=169n -rLT=169n
groff            15902    0 /usr/bin/troff -mtty-char -mandoc -rLL=169n -rLT=169n -Tutf8
groff            15903    0 /usr/bin/grotty
[...]


  • opensnoop

          Trace open() syscalls and print process name and path name details:

# opensnoop
PID    COMM               FD ERR PATH
27159  catalina.sh         3   0 /apps/tomcat8/bin/setclasspath.sh
4057   redis-server        5   0 /proc/4057/stat
2360   redis-server        5   0 /proc/2360/stat
30668  sshd                4   0 /proc/sys/kernel/ngroups_max
30668  sshd                4   0 /etc/group
30668  sshd                4   0 /root/.ssh/authorized_keys
30668  sshd                4   0 /root/.ssh/authorized_keys
30668  sshd               -1   2 /var/run/nologin
30668  sshd               -1   2 /etc/nologin
30668  sshd                4   0 /etc/login.defs
30668  sshd                4   0 /etc/passwd
30668  sshd                4   0 /etc/shadow
30668  sshd                4   0 /etc/localtime
4510   snmp-pass           4   0 /proc/cpuinfo
[...]


  • ext4slower

          Trace slow ext4 operations that are slower than a provided threshold (bcc has versions of this for btrfs, XFS, and ZFS as well):

# ext4slower 1
Tracing ext4 operations slower than 1 ms
TIME     COMM           PID    T BYTES   OFF_KB   LAT(ms) FILENAME
06:49:17 bash           3616   R 128     0           7.75 cksum
06:49:17 cksum          3616   R 39552   0           1.34 [
06:49:17 cksum          3616   R 96      0           5.36 2to3-2.7
06:49:17 cksum          3616   R 96      0          14.94 2to3-3.4
06:49:17 cksum          3616   R 10320   0           6.82 411toppm
06:49:17 cksum          3616   R 65536   0           4.01 a2p
06:49:17 cksum          3616   R 55400   0           8.77 ab
06:49:17 cksum          3616   R 36792   0          16.34 aclocal-1.14


  • biolatency

        Summarize block device I/O latency as a histogram every second:

# biolatency -mT 1
Tracing block device I/O... Hit Ctrl-C to end.
21:33:40
     msecs               : count     distribution
         0 -> 1          : 69       |****************************************|
         2 -> 3          : 16       |*********                               |
         4 -> 7          : 6        |***                                     |
         8 -> 15         : 21       |************                            |
        16 -> 31         : 16       |*********                               |
        32 -> 63         : 5        |**                                      |
        64 -> 127        : 1        |                                        |

21:33:41
     msecs               : count     distribution
         0 -> 1          : 60       |************************                |
         2 -> 3          : 100      |****************************************|
         4 -> 7          : 41       |****************                        |
         8 -> 15         : 11       |****                                    |
        16 -> 31         : 9        |***                                     |
        32 -> 63         : 6        |**                                      |
        64 -> 127        : 4        |*                                       |
[...]


  • biosnoop

         Trace block device I/O with process, disk, and latency details:

# biosnoop
TIME(s)        COMM           PID    DISK    T  SECTOR    BYTES   LAT(ms)
0.000004001    supervise      1950   xvda1   W  13092560  4096       0.74
0.000178002    supervise      1950   xvda1   W  13092432  4096       0.61
0.001469001    supervise      1956   xvda1   W  13092440  4096       1.24
0.001588002    supervise      1956   xvda1   W  13115128  4096       1.09
1.022346001    supervise      1950   xvda1   W  13115272  4096       0.98
1.022568002    supervise      1950   xvda1   W  13188496  4096       0.93
1.023534000    supervise      1956   xvda1   W  13188520  4096       0.79
1.023585003    supervise      1956   xvda1   W  13189512  4096       0.60


  • cachestat

        Show the page cache hit/miss ratio and size, and summarize every second:

# cachestat
    HITS   MISSES  DIRTIES  READ_HIT% WRITE_HIT%   BUFFERS_MB  CACHED_MB
  170610    41607       33      80.4%      19.6%           11        288
  157693     6149       33      96.2%       3.7%           11        311
  174483    20166       26      89.6%      10.4%           12        389
  434778       35       40     100.0%       0.0%           12        389
[...]


  • tcpconnect

        Trace TCP active connections (connect()):

# tcpconnect
PID    COMM         IP SADDR            DADDR            DPORT
25333  recordProgra 4  127.0.0.1        127.0.0.1        28527
25338  curl         4  100.66.3.172     52.22.109.254    80
25340  curl         4  100.66.3.172     31.13.73.36      80
25342  curl         4  100.66.3.172     104.20.25.153    80
25220  ssh          6  fe80::8a3:9dff:fed5:6b19 fe80::8a3:9dff:fed5:6b19 22
[...]


  • tcpaccept

          Trace TCP passive connections (accept()):

# tcpaccept
PID    COMM         IP RADDR            LADDR            LPORT
2287   sshd         4  11.16.213.254    100.66.3.172     22  
4057   redis-server 4  127.0.0.1        127.0.0.1        28527
4057   redis-server 4  127.0.0.1        127.0.0.1        28527
4057   redis-server 4  127.0.0.1        127.0.0.1        28527
2287   sshd         6  fe80::8a3:9dff:fed5:6b19 fe80::8a3:9dff:fed5:6b19 22  
4057   redis-server 4  127.0.0.1        127.0.0.1        28527
[...]


  • tcpretrans

         Trace TCP retransmits and TLPs:

# tcpretrans
TIME     PID    IP LADDR:LPORT          T> RADDR:RPORT          STATE
01:55:05 0      4  10.153.223.157:22    R> 69.53.245.40:34619   ESTABLISHED
01:55:05 0      4  10.153.223.157:22    R> 69.53.245.40:34619   ESTABLISHED
01:55:17 0      4  10.153.223.157:22    R> 69.53.245.40:22957   ESTABLISHED
[...]


  • gethostlatency

         Show latency for getaddrinfo/gethostbyname[2] library calls, system wide:

# gethostlatency
TIME      PID    COMM          LATms HOST
06:10:24  28011  wget          90.00 www.iovisor.org
06:10:28  28127  wget           0.00 www.iovisor.org
06:11:25  29404  curl          72.00 foo
06:11:28  29475  curl           1.00 foo 
[...]


  • runqlat

        Show run queue (scheduler) latency as a histogram, every 5 seconds:

# runqlat -m 5
Tracing run queue latency... Hit Ctrl-C to end.

     msecs               : count     distribution
         0 -> 1          : 2085     |****************************************|
         2 -> 3          : 8        |                                        |
         4 -> 7          : 20       |                                        |
         8 -> 15         : 191      |***                                     |
        16 -> 31         : 420      |********                                |

     msecs               : count     distribution
         0 -> 1          : 1798     |****************************************|
         2 -> 3          : 11       |                                        |
         4 -> 7          : 45       |*                                       |
         8 -> 15         : 441      |*********                               |
        16 -> 31         : 1030     |**********************                  |
[...]

  • profile

         Sample stack traces at 49 Hertz, then print unique stacks with the number of occurrences seen:

# profile
Sampling at 49 Hertz of all threads by user + kernel stack... Hit Ctrl-C to end.
^C 
[...]
    ffffffff811a2eb0 find_get_entry
    ffffffff811a338d pagecache_get_page
    ffffffff811a51fa generic_file_read_iter
    ffffffff81231f30 __vfs_read
    ffffffff81233063 vfs_read
    ffffffff81234565 SyS_read
    ffffffff818739bb entry_SYSCALL_64_fastpath
    00007f4757ff9680 read
    -                dd (14283)
        29
[...]