eBPF: Empowering Network Packet Analysis in Application Development

Hemant Rawat
7 min readDec 30, 2023

--

Image Source: Walmart

How can I develop applications that efficiently analyze incoming and outgoing traffic?

Historically, the following methods have been commonly used for this purpose: Raw Socket & Kernel Module.

1. Raw Socket

Linux introduces the raw socket type concept, allowing application developers to write applications in userspace. These applications utilize syscalls to create sockets using the sock() function, specifically requesting a raw socket.

Figure 1: Sockets

Data packets are copied from the kernel space to the user space. Subsequently, the application developer gains access to the Ethernet, IP, and Data layers of the incoming or outgoing payload, enabling them to make modifications as needed. Nevertheless, the Linux Kernel retains responsibility for making forwarding decisions such as routing table lookups and interface lookups.

This approach requires all packets to be copied to User space thus introducing latency.

2. Kernel Module

Linux provides the concept of loadable kernel modules (LKMs). These modules consist of code segments that are loaded into or unloaded from the kernel space based on demand. Components like network drivers and USB drivers are examples of functionalities implemented through these kernel modules.

Figure 2: Netfilter kernel module

The netfilter subsystem offers multiple hook points within the Linux networking stack for packet interception. These points include locally generated packets, packets before routing decisions, and packets after routing decisions. You can insert custom kernel modules at these points to intercept, modify, and analyze packets. These loadable kernel modules operate directly within the kernel space code.

The downside is that any memory access errors can trigger a kernel panic, resulting in a system crash.

As observed, both raw sockets and kernel modules come with their distinct limitations. Now, let’s explore an alternative that enables the safe execution of packet processing programs within the kernel without introducing any latency.

3. BPF

“BPF” originally stood for “Berkeley Packet Filter”, it got its start as a simple language for writing packet-filtering code for utilities like Tcpdump, Wireshark, OpenvSwitch.

BPF is a highly flexible and efficient virtual machine-like construct in the Linux kernel allowing it to execute bytecode at various hook points in a safe manner.

A BPF program can be attached to any kernel function so that whenever that kernel function is executed the BPF program is also executed with it.

However, as additional use cases emerged, the concept of Extended BPF (eBPF) was introduced.

Given its origin, eBPF is especially suited to writing network programs and it’s possible to write programs that attach to a network socket to filter traffic, to classify traffic, and to run network classifier actions.

eBPF is also useful for debugging the kernel and carrying out performance analysis; programs can be attached to tracepoints, kprobes, and perf events. Because eBPF programs can access kernel data structures, developers can write and test new debugging code without having to recompile the kernel.

An eBPF program is “attached” to a designated code path in the kernel. When the code path is traversed, any attached eBPF programs are executed.

eBPF introduced an enhanced virtual machine, BPF maps, and a range of new program types and use cases such as kprobe, XDP, lightweight transformation, among others.

Figure 3: BPF Architecture

How does kernel ensure the program safety

With the introduction of eBPF, more flexibility was added to BPF. The kernel validates the BPF program before loading it into the kernel, which means it ensures it terminates. This is achieved in BPF by no jump-back instructions and a limit to a total number of instructions.

The maximum instruction limit per program is restricted to 4096 BPF instructions, which by design, means that any program will terminate quickly. Although the instruction set contains forward as well as backward jumps, the in-kernel BPF verifier will forbid loops so that termination is always guaranteed. Since BPF programs run inside the kernel, the verifier’s job is to make sure that these are safe to run, not affecting the system’s stability. This means that from an instruction set point of view, loops can be implemented, but the verifier will restrict that. However, there is also a concept of tail calls that allows for one BPF program to jump into another one.

i. BPF Verifier

Upon loading a BPF program, the BPF verifier validates the bytecode by checking for various errors. The following is a few of the things the BPF verifier checks.

· Invalid memory access/Out of bounds memory access WRT SKBs

· Null pointer checking

· Unbound loops

· Program control flow in DAG (No backwards loops, and all points have an exit)

· Too many arguments in a function

ii. BPF Complier

BPF in kernel space is implemented as a virtual machine. The virtual machine contains a JIT compiler, which translates the eBPF instruction set -> host specific instruction set.

eBPF programs, have their own instruction set (bytecode). Compilers for the host platform take Pseudo C/Rust and converts this to eBPF bytecode. Currently, the defacto complier is CLang/LLVM.

iii. BPF Syscalls

BPF syscalls are made from userspace, in order to perform various operations within kernel space/ here is the following of available things that the syscalls can do:

· Load/Unload BPF program

· Pin/Unpin a BPF program to the filesystem

· Load/Unload BPF maps

· BPF Map CRUD

Typically, you will not call these BFP syscalls directly, as there will be userspace libraries and programs which do this for you. e.g. bpftool, Cilium EBPF Go, traffic control, libbpf.

iv. BPF Maps

Application in userspace interact with the BPF program via “BPF Maps”. These are generic data structures in the form of key value pairs which act as a bridge between user space and kernel space.

v. BPF Helper API

BPF programs have access to the bpf-helpers header. These functions in this API hide away a lot of complexity when dealing with BPF programs. Here are some of the things that are useful that bpf-helpers provides.

· Debug printout (GPL only)

· Resizing SKB headroom/tailroom

· Copying data from SKB -> stack memory

· Internet Checksum recalculation for IP/TCP/UDP

vi. eBPF: Data Stores

eBPF introduced a variety of data structures for storing states. They can be accessed from a BPF program in order to keep state among multiple BPF program invocations. They can also be accessed through file descriptors from user space that can be arbitrarily shared with other BPF programs or user space applications.

vii. ByteCode

BPF provides the following basic instructions:

· BPF_LD and variants: load operations

· BPF_ST and variants: store operations

· BPF_ALU and variants: all arithmetic operations

· BPF_JMP and variants: jump instructions

· CALL: calling other BPF programs and kernel helper functions / also possible to do a tail call where execution never comes back to caller BPF program

· EXIT:

Writing your BPF program

Socket buffers are the fundamental data structure used for the Linux networking stack. The majority of BPF program callbacks, have to deal directly with Linux SKBs.

Figure 4: Socket Buffer structure

Historically, it was necessary to write eBPF assembly and use the kernel’s bpf_asm assemble to generate the bytecode. Fortunately, the LLVM clang compiler has grown support for an eBPF backend that compiles C into bytecode. Object files containing this bytecode can then be directly loaded with the bpf() system call and BPF_PROG_LOAD command. To make it easier to write eBPF programs, the kernel provides the libbpf library, which includes helper functions for loading programs and creating and manipulating eBPF objects. BCC project includes a complete toolchain for writing eBPF programs and loading them without linking against the kernel source tree.

Classical vs eBPF

Classical BPF ISA

· Two 32-bit registers: A, X

· Implicit stack of 16 32-bit slots (LD_MEM, ST_MEM insns)

· Full integer arithmetic

· Explicit load/store from packet (LD_ABS, LD_IND insns)

· Conditional branches (with two destinations: jump true/false)

eBPF

· Introduced 10 64-bit registers

· More variants for LD/ST/JMP instructions

· CALL opcode introduced to call other BPF programs and kernel helper functions

· Introduction of data stores which are shared with user space and other BPF programs running in kernel

· ISA was re-architected to be close to real assembly

· JIT for BPF was introduced for faster performance

· More hook points were introduced

· Backward jumps were introduced

Hoop Points: things you can attach eBPF program to:

· Network cards, with XDP

· tc egress/ingress (in network stack)

· kprobes (any kernel function)

· uprobes (any userspace function, like any C program with symbols)

· probes that were built for dtrace called “USDT probes”

· the JVM

· tracepoints

· seccomp / landlock security things

BPF LIMITATIONS:

· BPF is very limited from a bytecode perspective, which limits us to no function calls etc.

· Loops are prohibited within BPF, making it unfeasible to iterate headers or execute regex operations

· BPF interpreter only resides in kernel

· Although BPF support global data structures for sharing state, which might make it easy to craft a request/response and expose it to BPF, but user specified data structure can’t be supported

· Performing string manipulations poses a challenge when operating within the confines of the BPF interpreter’s capabilities

Conclusion: Why BPF?

· Performance: By executing programs in the kernel space and conducting atomic operations within this space, significant performance enhancements are achieved by minimizing user space involvement

· Secure: BPF programs operate within a sandboxed virtual machine, ensuring a secure execution environment

· HW Offloading: BPF programs have the capability to be offloaded to hardware devices. This action not only reduces the allocation within the kernel space but also somewhat bypasses the kernel itself.

REFERENCES:

[1] https://unix.org/what_is_unix/history_timeline.html

[2] https://www.dpdk.org/

[3] https://ebpf.io/

--

--