netdev - Re: Ethernet padding - ti

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cfa6494d-0d7c-a260-6f9f-d7b8d74b287e@gmail.com>
Date:   Mon, 31 May 2021 08:22:03 -0700
From:   Florian Fainelli <f.fainelli@...il.com>
To:     Vladimir Oltean <olteanv@...il.com>,
        Ben Hutchings <ben.hutchings@...ensium.com>
Cc:     netdev@...r.kernel.org,
        Grygorii Strashko <grygorii.strashko@...com>,
        linux-omap@...r.kernel.org, Andrew Lunn <andrew@...n.ch>,
        Vivien Didelot <vivien.didelot@...il.com>
Subject: Re: Ethernet padding - ti_cpsw vs DSA tail tag



On 5/31/2021 7:29 AM, Vladimir Oltean wrote:
> Hi Ben,
> 
> On Mon, May 31, 2021 at 02:40:52PM +0200, Ben Hutchings wrote:
>> I'm working on a system that uses a TI Sitara SoC with one of its
>> Ethernet ports connected to the host port of a Microchip KSZ8795
>> switch.  I'm updating the kernel from 4.14.y to 5.10.y.  Currently I
>> am using the ti_cpsw driver, but it looks like the ti_cpsw_new driver
>> has the same issue.
>>
>> The Microchip switch expects a tail tag on ingress from the host port
>> to control which external port(s) to forward to.  This must appear
>> immediately before the frame checksum.  The DSA core correctly pads
>> outgoing skbs to at least 60 bytes before tag_ksz appends the tag.
>>
>> However, since commit 9421c9015047 ("net: ethernet: ti: cpsw: fix min
>> eth packet size"), the cpsw driver pads outgoing skbs to at least 64
>> bytes.  This means that in smaller packets the tag byte is no longer
>> at the tail.
>>
>> It's not obvious to me where this should be fixed.  Should drivers
>> that pad in ndo_start_xmit be aware of any tail tag that needs to be
>> moved?  Should DSA be aware that a lower driver has a minimum size >
>> 60 bytes?
> 
> These are good questions.
> 
> In principle, DSA needs a hint from the master driver for tail taggers
> to work properly. We should pad to ETH_ZLEN + <the hint value> before
> inserting the tail tag. This is for correctness, to ensure we do not
> operate in marginal conditions which are not guaranteed to work.
> 
> A naive approach would be to take the hint from master->min_mtu.
> However, the first issue that appears is that the dev->min_mtu value is
> not always set quite correctly.
> 
> The MTU in general measures the number of bytes in the L2 payload (i.e.
> not counting the Ethernet + VLAN header, nor FCS). The DSA tag is
> considered to be a part of the L2 payload from the perspective of a
> DSA-unaware master.
> 
> But ether_setup() sets up dev->min_mtu by default to ETH_MIN_MTU (68),
> which cites RFC791. This says:
> 
>     Every internet module must be able to forward a datagram of 68
>     octets without further fragmentation.  This is because an internet
>     header may be up to 60 octets, and the minimum fragment is 8 octets.
> 
> But many drivers simply don't set dev->min_mtu = 0, even if they support
> sending minimum-sized Ethernet frames. Many set dev->min_mtu to ETH_ZLEN,
> proving nothing except the fact that they don't understand that the
> Ethernet header should not be counted by the MTU anyway.
> 
> So to work with these drivers which leave dev->min_mtu = ETH_MIN_MTU, we
> would have to pad the packets in DSA to ETH_ZLEN + ETH_MIN_MTU. This is
> not quite ideal, so even if it would be the correct approach, a large
> amount of drivers would have to be converted to set dev->min_mtu = 0
> before we could consider switching to that and not have too many
> regressions.
> 
> Also, dev->min_mtu does not appear to have a very strict definition
> anywhere other than "Interface Minimum MTU value". My hopes were some
> guarantees along the lines of "if you try to send a packet with a
> smaller L2 payload than dev->mtu, the controller might pad the packet".
> But no luck with that, it seems.
> 
> Going to commit 9421c9015047, it looks like that took a shortcut for
> performance reasons, and omitted to check whether the skb is actually
> VLAN-tagged or not, and if egress untagging was requested or not.
> My understanding is that packets smaller than CPSW_MIN_PACKET_SIZE _can_
> be sent, it's only that the value was chosen (too) conservatively as
> VLAN_ETH_ZLEN. The cpsw driver might be able to check whether the packet
> is a VLAN tagged one by looking at skb->protocol, and choose the pad
> size dynamically. Although I can understand why Grygorii might not want
> to do that.

Agree, that specific commit seems to be possibly by off by 4 in most cases.

> 
> The pitfall is that even if we declare the proper min_mtu value for
> every master driver, it would still not avoid padding in the cpsw case.
> This is because the reason cpsw pads is due to VLAN, but VLAN is not
> part of the L2 payload, so cpsw would still declare dev->min_mtu = 0 in
> spite of needing to pad.
> 
> The only honest solution might be to extend struct net_device and add a
> pad_size value somewhere in there. You might be able to find a hole with
> pahole or something, and it doesn't need to be larger than an u8 (for up
> to 255 bytes of padding). Then cpsw can set master->pad_size, and DSA
> can look at it for tail taggers.

Do we need another way for drivers to be left a chance to be wrong? Even
if we document the semantics of net_device::pad_size correctly this
probably won't cut it.

TBH, I don't fully understand why the network stack has left so much
leeway for Ethernet drivers to do their own padding as opposed to making
sure that non-tagged (VLAN, DSA, whatever) frames are guaranteed to be
at least 60 bytes when they reach ndo_start_xmit() and then just leave
the stacking of devices to add their bytes where they need them, with
the special trailer case that is a tiny bit harder to figure out. Maybe
back in the days most Ethernet NICs would hardware pad and this only
became a concern with newer/cheaper/embedded SoCs NICs that can no
longer hardware pad by default? I can understand the argument about raw
sockets which should permit an application to have full control over the
minimum packet length, but again, in general we have a real link partner
on the other side that is not going to be very tolerant.
-- 
Florian