netdev - Re: [PATCH net-next] sfc: remove udp_tnl_has

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20200701161145.3f9f9a06@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
Date:   Wed, 1 Jul 2020 16:11:45 -0700
From:   Jakub Kicinski <kuba@...nel.org>
To:     Edward Cree <ecree@...arflare.com>
Cc:     <davem@...emloft.net>, <netdev@...r.kernel.org>,
        <mhabets@...arflare.com>, <linux-net-drivers@...arflare.com>
Subject: Re: [PATCH net-next] sfc: remove udp_tnl_has_port

On Wed, 1 Jul 2020 23:02:09 +0100 Edward Cree wrote:
> On 01/07/2020 19:43, Jakub Kicinski wrote:
> > There's a number of drivers which try to match the UDP ports. That
> > seems fragile to me. Is it actually required for HW to operate
> > correctly?   
> For EF10 hardware, yes, because the hardware parses the packet headers
>  to find for itself the offsets at which to make the various TSO edits;
>  thus its parser needs to know which UDP ports correspond to VXLAN or
>  GENEVE.  If a GSO skb arrives at the driver with skb->encapsulation
>  set but on a UDP port that's not known to the hardware, the driver
>  will have to reject it in ndo_features_check() or 'manually' fall back
>  to software segmentation from the transmit path.

I see. I'm asking because I'm working on a rewrite of udp tunnel-
-related callbacks. I'll keep the ef10's table checking, then.

We can drop this patch if you plan to upstream the support for TX
side offloads soon.

> EF10 also makes use of encap parsing on receive, for CHECKSUM_UNNECESSARY
>  offload (with CSUM_LEVEL) as well as RSS and filtering on inner headers
>  (although there is currently no driver support for inner-frame RXNFC, as
>  ethtool's API doesn't cover it).
> > Aren't the ports per ns in the kernel? There's no guarantee that some
> > other netns won't send a TSO skb and whatever other UDP encap.  
> That is indeed one of the flaws with port-based tunnel offloads; in
>  theory the UDP port's scope is only the 3-tuple of the socket used by
>  the tunnel device, so never mind netns, it would be logically valid to
>  use the same port for different encap protocols on different IP
>  addresses on the same network interface.
> AFAICT udp_tunnel_notify_add_rx_port() gets a netns from the sock and
>  then calls the ndo for every netdev in that ns.  So in a setup like
>  that, the ndo would get called twice for the same port (without any IP
>  address information other than sa_family being passed to the driver),
>  the driver would ignore the second one (print a netif_dbg and return
>  EEXIST, which the caller ignores), and any TSO skbs trying to use the
>  second one would be parsed by the hardware with the wrong encap type
>  and probably go out garbled on the wire.  I think at the time everyone
>  took the position that "this is a really unlikely setup and if anybody
>  really wants to do that they'll just have to turn off encap TSO".
> 
> So ndo_udp_tunnel_add is a fundamentally broken interface that people
>  shouldn't design new hardware to support but it's close enough that it
>  seems reasonable to use it to get _some_ encap TSO mileage out of the
>  port-based hardware that already exists.  Agree/disagree/other?

The port offload interface is just a hint for RX side offloads which
can't cause harm. It's the use of this hint as a hard fact for TX
offloads which is incorrect.

If NIC thinks the inner csum is invalid because the packet was in fact
not carrying encapsulated frames - it should just pass it up the stack.
We don't trust NICs to tell us checksums are wrong.

RSS is also relatively harmless if gone wrong. Most NICs actually
default to not computing RSS using inner headers, to stay on the safe
side.