[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20201118223011.3216-1-pablo@netfilter.org>
Date: Wed, 18 Nov 2020 23:30:02 +0100
From: Pablo Neira Ayuso <pablo@...filter.org>
To: netfilter-devel@...r.kernel.org
Cc: davem@...emloft.net, netdev@...r.kernel.org, kuba@...nel.org,
razor@...ckwall.org, tobias@...dekranz.com, jeremy@...zel.net
Subject: [PATCH net-next,v4 0/9] netfilter: flowtable bridge and vlan enhancements
Hi,
The following patchset augments the Netfilter flowtable fastpath [1] to
support for network topologies that combine IP forwarding, bridge and
VLAN devices.
This v4 includes updates for Patches:
- Patch #3 Check for possible device stack overflow in
dev_fill_forward_path(), per Jakub Kicinski.
- Patch #3 Remove memset for the net_device_path_ctx structure in
dev_fill_forward_path() and avoid full memset of struct
net_device_path_stack.
- Patch #5 Check for valid bridge dst pointer after READ_ONCE, per Nikolay
Aleksandrov.
- Patch #6 remove initialization of the net_device_path_ctx structure,
already done from patch #3.
A typical scenario that can benefit from this infrastructure is composed
of several VMs connected to bridge ports where the bridge master device
'br0' has an IP address. A DHCP server is also assumed to be running to
provide connectivity to the VMs. The VMs reach the Internet through
'br0' as default gateway, which makes the packet enter the IP forwarding
path. Then, netfilter is used to NAT the packets before they leave
through the wan device.
Something like this:
fast path
.------------------------.
/ \
| IP forwarding |
| / \ .
| br0 eth0
. / \
-- veth1 veth2
.
.
.
eth0
ab:cd:ef:ab:cd:ef
VM
The idea is to accelerate forwarding by building a fast path that takes
packets from the ingress path of the bridge port and place them in the
egress path of the wan device (and vice versa). Hence, skipping the
classic bridge and IP stack paths.
This patchset is composed of:
Patch #1 adds a placeholder for the hash calculation, instead of using
the dir field.
Patch #2 adds the transmit path type field to the flow tuple. Two transmit
paths are supported so far: the neighbour and the xfrm transmit
paths. This patch comes in preparation to add a new direct ethernet
transmit path (see patch #7).
Patch #3 adds dev_fill_forward_path() and .ndo_fill_forward_path() to
netdev_ops. This new function describes the list of netdevice hops
to reach a given destination MAC address in the local network topology,
e.g.
IP forwarding
/ \
br0 eth0
/ \
veth1 veth2
.
.
.
eth0
ab:cd:ef:ab:cd:ef
where veth1 and veth2 are bridge ports and eth0 provides Internet
connectivity. eth0 is the interface in the VM which is connected to
the veth1 bridge port. Then, for packets going to br0 whose
destination MAC address is ab:cd:ef:ab:cd:ef, dev_fill_forward_path()
provides the following path: br0 -> veth1.
Patch #4 adds .ndo_fill_forward_path for VLAN devices, which provides the next
device hop via vlan->real_dev. This annotates the VLAN id and protocol.
This is useful to know what VLAN headers are expected from the ingress
device. This also provides information regarding the VLAN headers
to be pushed in the egress path.
Patch #5 adds .ndo_fill_forward_path for bridge devices, which allows to make
lookups to the FDB to locate the next device hop (bridge port) in the
forwarding path.
Patch #6 updates the flowtable to use the dev_fill_forward_path()
infrastructure to obtain the ingress device in the fastpath.
Patch #7 updates the flowtable to use dev_fill_forward_path() to obtain the
egress device in the forwarding path. This also adds the direct
ethernet transmit path, which pushes the ethernet header to the
packet and send it through dev_queue_xmit(). This patch adds
support for the bridge, so bridge ports use this direct xmit path.
Patch #8 adds ingress VLAN support (up to 2 VLAN tags, QinQ). The VLAN
information is also provided by dev_fill_forward_path(). Store the
VLAN id and protocol in the flow tuple for hash lookups. The VLAN
support in the xmit path is achieved by annotating the first vlan
device found in the xmit path and by calling dev_hard_header()
(previous patch #7) before dev_queue_xmit().
Patch #9 extends nft_flowtable.sh selftest: This is adding a test to
cover bridge and vlan support coming in this patchset.
= Performance numbers
My testbed environment consists of three containers:
192.168.20.2 .20.1 .10.1 10.141.10.2
veth0 veth0 veth1 veth0
ns1 <---------> nsr1 <--------> ns2
SNAT
iperf -c iperf -s
where nsr1 is used for forwarding. There is a bridge device br0 in nsr1,
veth0 is a port of br0. SNAT is performed on the veth1 device of nsr1.
- ns2 runs iperf -s
- ns1 runs iperf -c 10.141.10.2 -n 100G
My results are:
- Baseline (no flowtable, classic forwarding path + netfilter): ~16 Gbit/s
- Fastpath (with flowtable, this patchset): ~25 Gbit/s
This is an improvement of ~50% compared to baseline.
Please, apply this patchset.
Thank you.
Pablo Neira Ayuso (9):
netfilter: flowtable: add hash offset field to tuple
netfilter: flowtable: add xmit path types
net: resolve forwarding path from virtual netdevice and HW destination address
net: 8021q: resolve forwarding path for vlan devices
bridge: resolve forwarding path for bridge devices
netfilter: flowtable: use dev_fill_forward_path() to obtain ingress device
netfilter: flowtable: use dev_fill_forward_path() to obtain egress device
netfilter: flowtable: add vlan support
selftests: netfilter: flowtable bridge and VLAN support
include/linux/netdevice.h | 35 +++
include/net/netfilter/nf_flow_table.h | 43 +++-
net/8021q/vlan_dev.c | 15 ++
net/bridge/br_device.c | 27 +++
net/core/dev.c | 36 ++++
net/netfilter/nf_flow_table_core.c | 51 +++--
net/netfilter/nf_flow_table_ip.c | 200 ++++++++++++++----
net/netfilter/nft_flow_offload.c | 159 +++++++++++++-
.../selftests/netfilter/nft_flowtable.sh | 82 +++++++
9 files changed, 588 insertions(+), 60 deletions(-)
--
2.20.1
Powered by blists - more mailing lists