[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALx6S34JePNX62=rPq5aTW6W_tpPwSeseGcq13iAaJ9Y53QTiQ@mail.gmail.com>
Date: Thu, 9 Jul 2020 19:33:11 -0700
From: Tom Herbert <tom@...bertland.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: "David S. Miller" <davem@...emloft.net>,
Linux Kernel Network Developers <netdev@...r.kernel.org>,
Saeed Mahameed <saeedm@...lanox.com>,
Michael Chan <michael.chan@...adcom.com>,
edwin.peer@...adcom.com, emil.s.tantilov@...el.com,
alexander.h.duyck@...ux.intel.com,
Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
Tariq Toukan <tariqt@...lanox.com>,
Michal Kubecek <mkubecek@...e.cz>
Subject: Re: [PATCH net-next v2 00/10] udp_tunnel: add NIC RX port offload infrastructure
The patch set looks good for its purpose, but, sorry, I can't resist
another round of complaining about vendors that continue to develop
protocol specific offloads instead of moving to protocol agnostic
generic offloads.
On Wed, Jul 8, 2020 at 6:19 PM Jakub Kicinski <kuba@...nel.org> wrote:
>
> Kernel has a facility to notify drivers about the UDP tunnel ports
> so that devices can recognize tunneled packets. This is important
> mostly for RX - devices which don't support CHECKSUM_COMPLETE can
> report checksums of inner packets, and compute RSS over inner headers.
With the more use of various routing headers and other protocols we're
going to see more and more cases where it's impossible for devices to
verify checksums. CHECKSUM_COMPLETE is the only way forward!
> Some drivers also match the UDP tunnel ports also for TX, although
> doing so may lead to false positives and negatives.
Hmm, do you know why they do that? I suppose because they don't
support NETIF_F_HW_CSUM and want to offload an inner checksum so they
need to parse the packet in device, which sadly means they need to
parse the packet in the driver to make sure the device can parse it
and set the checksum. So much driver complexity to support protocol
specific offloads... there is no reason why a driver should ever need
to parse a packet on transmit, everything it needs to know is in the
skbuff. Oh, what a tangled web we weave...
>
> Unfortunately the user experience when trying to take adavantage
> of these facilities is suboptimal. First of all there is no way
> for users to check which ports are offloaded. Many drivers resort
> to printing messages to aid debugging, other use debugfs. Even worse
> the availability of the RX features (NETIF_F_RX_UDP_TUNNEL_PORT)
> is established purely on the basis of the driver having the ndos
> installed. For most drivers, however, the ability to perform offloads
> is contingent on device capabilities (driver support multiple device
> and firmware versions). Unless driver resorts to hackish clearing
> of features set incorrectly by the core - users are left guessing
> whether their device really supports UDP tunnel port offload or not.
>
> There is currently no way to indicate or configure whether RX
> features include just the checksum offload or checksum and using
> inner headers for RSS. Many drivers default to not using inner
> headers for RSS because most implementations populate the source
> port with entropy from the inner headers. This, however, is not
> always the case, for example certain switches are only able to
> use a fixed source port during encapsulation.
>
Well, the concept of devices parsing UDP payloads is marginal to begin
with because of the port number ambiguity problem, but as long as the
device isn't modifying UDP payload data I guess we live with it.
As for RSS over the internal headers, I believe this is another lost
cause. The device can only parse what it's programmed to understand
which will always be less than what the host can do. Device sees a new
encap or extension header it doesn't understand, then it doesn't go
into that header and it hits a roadblock. For IPv6 all this should be
remedied by the flow label anyway which Linux and all other OSes set
by default, so in that case a device doesn't need to do anything more
than parse the IP header to do both RSS and RX csum.
> We have also seen many driver authors get the intricacies of UDP
> tunnel port offloads wrong. Most commonly the drivers forget to
> perform reference counting, or take sleeping locks in the callbacks.
>
> This work tries to improve the situation by pulling the UDP tunnel
> port table maintenance out of the drivers. It turns out that almost
> all drivers maintain a fixed size table of ports (in most cases one
> per tunnel type), so we can take care of all the refcounting in the
> core, and let the driver specify if they need to sleep in the
> callbacks or not. The new common implementation will also support
> replacing ports - when a port is removed from a full table it will
> try to find a previously missing port to take its place.
>
> This patch only implements the core functionality along with a few
> drivers I was hoping to test manually [1] along with a test based
> on a netdevsim implementation. Following patches will convert all
> the drivers. Once that's complete we can remove the ndos, and rely
> directly on the new infrastrucutre.
>
> Then after RSS (RXFH) is converted to netlink we can add the ability
> to configure the use of inner RSS headers for UDP tunnels.
>
> [1] Unfortunately I wasn't able to, turns out 2 of the devices
> I had access to were older generation or had old FW, and they
> did not actually support UDP tunnel port notifications (see
> the second paragraph). The thrid device appears to program
> the UDP ports correctly but it generates bad UDP checksums with
> or without these patches. Long story short - I'd appreciate
> reviews and testing here..
>
> Jakub Kicinski (10):
> debugfs: make sure we can remove u32_array files cleanly
> udp_tunnel: re-number the offload tunnel types
> udp_tunnel: add central NIC RX port offload infrastructure
> ethtool: add tunnel info interface
> netdevsim: add UDP tunnel port offload support
> selftests: net: add a test for UDP tunnel info infra
> ixgbe: don't clear UDP tunnel ports when RXCSUM is disabled
> ixgbe: convert to new udp_tunnel_nic infra
> bnxt: convert to new udp_tunnel_nic infra
> mlx4: convert to new udp_tunnel_nic infra
>
> Documentation/filesystems/debugfs.rst | 12 +-
> Documentation/networking/ethtool-netlink.rst | 33 +
> drivers/net/ethernet/broadcom/bnxt/bnxt.c | 133 +--
> drivers/net/ethernet/broadcom/bnxt/bnxt.h | 6 -
> drivers/net/ethernet/intel/ixgbe/ixgbe.h | 3 -
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 200 +---
> .../net/ethernet/mellanox/mlx4/en_netdev.c | 107 +--
> drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 2 -
> drivers/net/geneve.c | 6 +-
> drivers/net/netdevsim/Makefile | 2 +-
> drivers/net/netdevsim/dev.c | 1 +
> drivers/net/netdevsim/netdev.c | 12 +-
> drivers/net/netdevsim/netdevsim.h | 19 +
> drivers/net/netdevsim/udp_tunnels.c | 192 ++++
> drivers/net/vxlan.c | 6 +-
> fs/debugfs/file.c | 27 +-
> include/linux/debugfs.h | 12 +-
> include/linux/netdevice.h | 8 +
> include/net/udp_tunnel.h | 152 ++-
> include/uapi/linux/ethtool.h | 2 +
> include/uapi/linux/ethtool_netlink.h | 55 ++
> mm/cma.h | 3 +
> mm/cma_debug.c | 7 +-
> net/ethtool/Makefile | 3 +-
> net/ethtool/common.c | 9 +
> net/ethtool/common.h | 1 +
> net/ethtool/netlink.c | 12 +
> net/ethtool/netlink.h | 4 +
> net/ethtool/strset.c | 5 +
> net/ethtool/tunnels.c | 259 +++++
> net/ipv4/Makefile | 3 +-
> net/ipv4/{udp_tunnel.c => udp_tunnel_core.c} | 0
> net/ipv4/udp_tunnel_nic.c | 897 ++++++++++++++++++
> net/ipv4/udp_tunnel_stub.c | 7 +
> .../drivers/net/netdevsim/udp_tunnel_nic.sh | 786 +++++++++++++++
> 35 files changed, 2597 insertions(+), 389 deletions(-)
> create mode 100644 drivers/net/netdevsim/udp_tunnels.c
> create mode 100644 net/ethtool/tunnels.c
> rename net/ipv4/{udp_tunnel.c => udp_tunnel_core.c} (100%)
> create mode 100644 net/ipv4/udp_tunnel_nic.c
> create mode 100644 net/ipv4/udp_tunnel_stub.c
> create mode 100644 tools/testing/selftests/drivers/net/netdevsim/udp_tunnel_nic.sh
>
> --
> 2.26.2
>
Powered by blists - more mailing lists