netdev - Re: [PATCH net-next v4 11/11] net: dsa: realtek: rtl8365mb: multiple cpu ports, non cpu extint

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220124134242.595fd728@kicinski-fedora-PC1C0HJN.hsd1.ca.comcast.net>
Date:   Mon, 24 Jan 2022 13:42:42 -0800
From:   Jakub Kicinski <kuba@...nel.org>
To:     Vladimir Oltean <olteanv@...il.com>
Cc:     Florian Fainelli <f.fainelli@...il.com>,
        Luiz Angelo Daros de Luca <luizluca@...il.com>,
        Andrew Lunn <andrew@...n.ch>,
        Frank Wunderlich <frank-w@...lic-files.de>,
        Alvin Šipraga <ALSI@...g-olufsen.dk>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linus.walleij@...aro.org" <linus.walleij@...aro.org>,
        "vivien.didelot@...il.com" <vivien.didelot@...il.com>,
        "arinc.unal@...nc9.com" <arinc.unal@...nc9.com>
Subject: Re: [PATCH net-next v4 11/11] net: dsa: realtek: rtl8365mb:
 multiple cpu ports, non cpu extint

On Mon, 24 Jan 2022 22:56:07 +0200 Vladimir Oltean wrote:
> On Mon, Jan 24, 2022 at 11:38:12AM -0800, Jakub Kicinski wrote:
> > On Mon, 24 Jan 2022 21:08:45 +0200 Vladimir Oltean wrote:  
> > > So before we declare that any given Ethernet driver is buggy for declaring
> > > NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM and not checking that skb->csum_start
> > > points where it expects it to (taking into consideration potential VLAN
> > > headers, IPv6 extension headers),  
> >
> > Extension headers are explicitly not supported by NETIF_F_IPV6_CSUM.
> >
> > IIRC Tom's hope was to delete NETIF_F_IP*_CSUM completely once all
> > drivers are converted to parsing and therefore can use NETIF_F_HW_CSUM.  
> 
> IIUC, NETIF_F_IP*_CSUM vs NETIF_F_HW_CSUM doesn't make that big of a
> difference in terms of what the driver should check for, if the hardware
> checksum offload engine can't directly be given the csum_start and
> csum_offset, wherever they may be.
> 
> > > is there any driver that _does_ perform these checks correctly, that
> > > could be used as an example?  
> >
> > I don't think so. Let me put it this way - my understanding is that up
> > until now we had been using the vlan_features, mpls_features etc to
> > perform L2/L2.5/below-IP feature stripping. This scales poorly to DSA
> > tags, as discussed in this thread.
> >
> > I'm suggesting we extend the kind of checking we already do to work
> > around inevitable deficiencies of device parsers for tunnels to DSA
> > tags.  
> 
> Sorry, I'm very tired and I probably don't understand what you're
> saying, so excuse the extra clarification questions.
> 
> The typical protocol checking that drivers with NETIF_F_HW_CSUM do seems
> to be based on vlan_get_protocol()/skb->protocol/skb_network_header()/
> skb_transport_header() values, all of which make DSA invisible. So they
> don't work if the underlying hardware really doesn't like seeing an
> unexpected DSA header.
> 
> When you say "I'm suggesting we extend the kind of checking we already do",
> do you mean we should modify the likes of e1000e and igb such that, if
> they're ever used as DSA masters, they do a full header parse of the
> packet (struct ethhdr :: h_proto, check if VLAN, struct iphdr/ipv6hdr,
> etc.) instead of the current logic?

That was my thinking, yes. The exact amount of work depends on the
driver, I believe that more recent Intel parts (igb, ixgbe and newer)
pass a L3 offset to the HW. They treat L2 as opaque, ergo no patches
needed. At a glance e1000e passes the full skb_checksum_start_offset()
to HW, so likely even better. 

It's only drivers for devices which actually want to parse the Ethertype
that would need extra checks. (Coincidentally such devices can't support
MPLS given the lack of L3 indication in the frame.)

> It will be pretty convoluted unless
> we have some helper. Because if I follow through, for a DSA-tagged IP
> packet on xmit, skb->protocol is certainly htons(ETH_P_IP):
> 
> ntohs(skb->protocol) = 0x800, csum_offset = 16, csum_start = 280, skb_checksum_start_offset = 54, skb->network_header = 260, skb_network_header_len = 20
> 
> skb_dump output:
> skb len=94 headroom=226 headlen=94 tailroom=384
> mac=(226,34) net=(260,20) trans=280
> shinfo(txflags=0 nr_frags=0 gso(size=0 type=1 segs=1))
> csum(0x100118 ip_summed=3 complete_sw=0 valid=0 level=0)
> hash(0x7710ee84 sw=0 l4=1) proto=0x0800 pkttype=0 iif=0
> dev name=eno2 feat=0x00020100001149a9
> sk family=2 type=1 proto=6
> skb headroom: 00000000: 6c 00 03 02 64 65 76 00 fe ed ca fe 28 00 00 00
> ...(junk)...
> skb headroom: 000000e0: 5f 43
>                         20 byte DSA tag
>                         |
>                         v
> skb linear:   00000000: 88 80 00 0a 80 00 00 00 00 00 00 00 08 00 30 00
>                                     skb_mac_header()
>                                     |
>                                     v
> skb linear:   00000010: 00 00 00 00 68 05 ca 92 af 20 00 04 9f 05 f6 28
>                               skb_network_header()
>                               |
>                               v
> skb linear:   00000020: 08 00 45 00 00 3c 26 47 40 00 40 06 00 49 0a 00
>                                           skb_checksum_start_offset
>                                           |
>                                           |                       csum_offset
>                                           v                       v
> skb linear:   00000030: 00 2c 0a 00 00 01 b6 08 14 51 11 1f 91 4f 00 00
> skb linear:   00000040: 00 00 a0 02 fa f0 14 5b 00 00 02 04 05 b4 04 02
> skb linear:   00000050: 08 0a 2e 00 e5 b8 00 00 00 00 01 03 03 07

Oof, so in this case the DSA tag is _before_ the skb_mac_header()?
Or the prepend is supposed to be parsable as a Ethernet header?
Seems like any device that can do csum over this packet must already 
use L3/L4 offsets or have explicit knowledge of DSA, right?

> I don't know, I just don't expect that non-DSA users of those drivers
> will be very happy about such changes. Do these existing protocol
> checking schemes qualify as buggy?

Unfortunate reality of the checksum offloads is that most drivers for
devices which parse on Tx are buggy, it's more of a question of whether
anyone tried to use an unsupported protocol stack :( Recent example
that comes to mind is 1698d600b361 ("bnxt_en: Implement
.ndo_features_check().").

> If this is the convention that we want to enforce, then I can't really
> help Luiz with fixing the OpenWRT mtk_eth_soc.c - he'll have to figure
> out a way to parse the packets for which his hardware will accept the
> checksumming offload, and call skb_checksum_help() otherwise.
> 
> > We can come up with various schemes of expressing capabilities
> > between underlying driver and tag driver. I'm not aware of similar
> > out-of-band schemes existing today so it'd be "DSA doing it's own
> > thing", which does not seem great.  
> 
> It at least seems less complex to me, and less checking in the fast path
> if I understand everything that's been said correctly.

I understand, I'm primarily trying to share some context and prior work.
I don't mean to nack all other approaches.

I believe writing a parser matching the device behavior would be easier
for a driver author than interpreting the runes of our csum offload API
and getting thru the thicket of all the bits. If that's not the case my
argument is likely defeated.