Understanding nftables: The Linux Packet Filtering Framework

9 min readJun 24, 2024

Packet filtering is crucial for securing systems and managing network traffic. Lets look into ‘nftables’ as the replacement for the iptables.

First lets begin with looking at the path a packet takes:

Packet first arrives at NIC from the network
Packet is then copied to the kernel memory ring buffer via DMA (Direct Memory Access)
Hardware interrupt is generated to let system know that the packet is in the memory
Device driver then calls into New API (NAPI ) to start poll loop
Ksoftirqd then pulls packet out of the ring buffer via NAPI poll function
Memory regions that had the packet in the kernel memory ring buffer are unmapped
DMA’ed data is passed as a socket buffer (skb) up to the networking layer
Packet routing sends it to CPU queues for processing (RPS, RSS can have an affect here)
Packets is moved from these queues to the protocol layer (TCP, UDP, etc)
Protocol layers move the skb into the buffers attached to specific sockets
Application process the packets

Well, there were lots of steps listed above, lets look into them:

NAPI

NAPI (“New API”) is an extension to the device driver packet processing framework, which is designed to improve the performance of high-speed networking. It provides following two key benefits:

Interrupt Mitigation: High-speed networking can generate thousands of interrupts per second, repeatedly notifying the system about the large number of packets awaiting processing. NAPI mitigates this by allowing drivers to operate with some interrupts disabled during high traffic periods, thereby reducing system load.
Packet throttling: When the system is overwhelmed and needs to drop packets, it’s more efficient to discard them early in the process. NAPI-compliant drivers can often ensure that packets are dropped directly by the network adapter, preventing them from reaching the kernel.

Netfilter & IPTables

There are frameworks provided by the kernel that allows for networking operations to be implemented via customized handlers. They performs packet filtering, network address translation, and port translation, among other things. They contains set of hooks inside the kernel that allows kernel modules to register callback functions within the networking stack.

Iptables is a userspace program that allows the configuration of netfilter via chains and rules attached to the chains. iptables is for IPv4, ip6tables for IPv6, arptables for ARP, and ebtables for Ethernet frames.

IP Protocol Layer

Fundamentally, the IP Protocol layer in the Linux networking stack works to route traffic as necessary — to the right interface when outgoing, or to the right socket when incoming.

Receiving packet flow: When a packet arrives over the medium, the network interface first checks if it is intended for the host computer. If it is, the packet is sent to the IP layer. If the packet is meant for an application, it is passed to the transport layer and placed in a socket to await the application to read it. The IP layer looks up the route to destination. If the destination is another computer, the packet is sent out through an output interface.

First the packet arrives at Ingress hook. The ingress hook is attached to a particular network interface. Here we can perform filtering in early stage.
Then packet traverse to PREROUTING hook, the packets is intercepted just before the kernel performs any routing decisions. At this stage packet has arrived on a networking interface but has not yet been processed with respect to systems routing table. Most common action that is performed on that hook is destination IP address translation.
Now we enter Routing decision phase, the kernel analyzes the destination IP address of incoming packet and matches it to the entries in the routing table. Based on the most specific match or the most suitable route, the kernel will makes the decision to route the packet for local delivery assuming the address is associated with local system or forward the packet to appropriate network interface or gateway if the destination address is on a different network.
Assuming its local address the packet will traverse the INPUT hook. It’s main purpose is define the fate of incoming packets destined to our local processes. If allowed traffic will reach local application.

Sending a packet: When an application generates traffic, it sends its packet through a socket to a transport layer (mostly TCP or UDP) and then on to the network layer (IP). Within the IP layer, the kernel determines the route to the destination host by first checking the routing cache and then looking in the Forwarding Information Base (FIB). If the packet destination is another computer, the kernel addresses it and sends it to the link layer, which ultimately sends the packet out to a physical medium.

Lets look it in terms of hooks.

First packet hit the routing decision and then goes to OUTPUT hook, at this stage we can filter any locally generated outgoing traffic.
Then packet moves to POSTROUTING hook. At this point kernel has made necessary routing decisions. Most common operation at this stage is source address translation, where we replace the source IP of packet with the IP of outgoing interface so the return traffic knows how to get back.
Now the packet goes to network interface. If the destination IP is on different network, the routing decision will forward the packet to FORWARD hook. This hook is for packet that are neither locally generated nor destined for local system. The primary purpose of FORWARD hook is to handle packets that are forwarded by system from one network to another. Here filtering can be performed if system is acting as router or gateway.

Summarizing IPTables

PREROUTING: Packets will enter this chain before a routing decision is made
INPUT: The packet is set for local delivery, which is managed by the “local-delivery” routing table and is independent of processes with open sockets:

ip route show table local

FORWARD: All packets that have been routed and that were not for local delivery will traverse this chain
OUTPUT: Packets sent from the machine itself will be visiting this chain
POSTROUTING: Packets enter this chain just before handling them off to the hardware. Routing decision has been made.

TCP Stack

Sending: When sending, the application writes into the TCP send buffer, the TCP state machine is updated, goes through the packet processing, and is then passed off to the IP layer.
Receiving: When receiving, after being pushed from the IP layer, the TCP process is scheduled to handle received packets. Each packet goes through a series of processing steps, the TCP state machine is updated, and the packet is stored inside the recv buffer

TCP Tuning

Before we get into nftables it is worth looking at some TCP tuning mechanisms:

net.core.rmem_[min/default/max] & net.core.wmem_[min/default/max] to tweak receive and send buffer sizes for all types of network traffic. (TCP autotuning settings — the first value is the minimum buffer — guaranteed even under system pressure, the second value is the default buffer for each TCP socket, and the third value is the maximum for each TCP socket.)
Number of packets allowed outstanding = min {congestion window (cwnd), receiver’s advertised window (rwnd)}
The send buffer holds all outstanding packets, as well as all data queued to be retransmitted. So, Congestion window !> send buffer (the congestion window can never grow larger than the send buffer can accommodate)
If either of these are too small throughput suffers will suffer and if too large then the amount of outstanding packets can grow large for the end to end and packets will get dropped resulting in packet loss
net.ipv4.tcp_rmem & net.ipv4.tcp_wmem

Now lets build a stateful firewall using nftables.

nftables

nftables is a subsystem of the Linux kernel that provides filtering and classification of network packets/datagrams/frames.

‘nft’ is used as a single tool to interact with nftables and manage all types of packet filtering (IPv4, IPv6, ARP, and network bridges), unlike iptables, where you require separate tools.

The key concepts for nftables are:

Tables: are containers for chains and sets. Different tables are used for different families (e.g., ip, ip6, arp, bridge).
Chains: are containers for rules. Chains are attached to specific hooks in the network stack (e.g., input, output, forward).
Rules: define the filtering logic, specifying conditions and actions for packets.
Sets: are collections of elements (e.g., IP addresses, ports) that can be used within rules.

Additional nftables features includes: counters, interval merging, timeouts.

Let’s create a simple firewall using nftables. This firewall will:

Allow all traffic on the loopback interface.
Allow SSH traffic.
Drop all other incoming traffic.

Ensure nftables is installed.

sudo apt-get update
sudo apt-get install nftables

Best way to make changes to nftables is by logging rules in a configuration file.

#!/usr/sbin/nft -f

# Flush existing rules and tables
flush ruleset

# Define a new table for the IPv4 filter. This is like a namespace that groups all rules together
# every table has a family that covers the traffic types it process
# ip = ipv4; ip6=ipv6
table ip filter {
    # Define the input chain that will group rules inside the table
    chain input {
        type filter hook input priority filter; policy drop;
        
        # Allow all traffic on the loopback interface
        iif lo accept
        
        # Accept established and related connections
        ct state established,related accept
        
        # Allow SSH traffic
        tcp dport 22 accept
        
        # Drop all other incoming traffic
    }
    
    # Define the output chain
    chain output {
        type filter hook output priority 0; policy accept;
    }
    
    # Define the forward chain
    chain forward {
        type filter hook forward priority 0; policy drop;
    }
}

In order for chain to process a traffic, we need to either jump to it from another chain or attached it to netfilter hook. A hook, as discussed previously, is a place in the kernel netfilter framework where other modules can register callback functions and they basically intercept and perform actions on a packet.

Every packet addressed to the local processes will be evaluated by the rules in chain. In order to attach chain to the hook, we provide the chain type as filter. The hook as input and set the priority as filter. Apart from attaching our filter chain to the input hook, we also define a default chain policy. It tells the firewall what to do after the last rule in the chain is evaluated, we can set it to accept so that the packet will keep traversing the networking stack or we can set it to drop to discard the packet. There are also a few others actions.

Rules are evaluated sequentially from top-to-bottom and left to right, First it checks traffic type is TCP then it going to port 22 only then accepts it.

By putting the drop keyword at the end of our input chain, we not only block inbound traffic coming from the outside of the server but also traffic originating from within the server via the loopback interface. This loopback traffic should not be blocked as many services rely on that to operate.

iff lo accept

This rule will allow traffic to incoming loopback interface.

How to check if the configuration work? Simple, place a counter.

counter udp port 1234 accept

To allow response to the traffic initiated by server, we use connection tracking mechanism inside the kernel that tracks the state of network connection passing through the system.

conntrack -L

Information from conntrack can be used to manage incoming/outgoing traffic. Its a Stateful packet inspection mechanism, as we are tracking and making decision based on connection state.

ct state established, realated counter accept

Reply packet from TCP 3-way handshake does not make the connection established, so why above rule works?

If a conntrack sees a SYN packet it considers the connection as new. If it sees a SYN ACK reply it considers a connection as Established. (Conntrack can match ICMP request & reply.)

ct state invalid counter drop

Packets that are not associated with any known or expected connection or protocol are dropped, as they can be malformed or corrupted packets.

We use conntrack to allow only first SYN packet that creates a new connection.

ct state new tcp dport 22 counter accept

only the SYN packet that start the TCP connection will match. All subsequent packet will not be matching this rule but will match the established rule.

Concatenation in nftables allows to match on multiple fields in a single rule. This feature can be used to create more complex and specific filtering rules.

Referencing allows you to create reusable sets of elements that can be referenced in multiple rules. This is particularly useful for managing large sets of IP addresses or ports.

Creating a name set and referencing it in rules.

set allowed_ips{
        typeof saddr.tcp dport
        elements = { 192.168.27.2 22, 192.168.28.65 22}
ct state new ip saddr @allowed_ips

This allows rules to be decoupled from list of allowed IPs.

nft add element ip filter allowed_ips {1.1.1.1}

Summary

nftables provides a powerful, flexible, and efficient framework for packet filtering in Linux. The advanced features of nftables, such as concatenation and referencing, allow for creating sophisticated and manageable firewall rules.

Understanding nftables: The Linux Packet Filtering Framework

Written by Hemant Rawat