netdev - Re: [PATCH v2 net-next 10/10] net: bridge: implement auto-normalization of MTU for hardware datapath

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <83375385-7881-53b7-c685-e166c8bdeba4@cumulusnetworks.com>
Date:   Thu, 26 Mar 2020 14:06:51 +0200
From:   Nikolay Aleksandrov <nikolay@...ulusnetworks.com>
To:     Ido Schimmel <idosch@...sch.org>,
        Vladimir Oltean <olteanv@...il.com>
Cc:     Andrew Lunn <andrew@...n.ch>,
        Florian Fainelli <f.fainelli@...il.com>,
        Vivien Didelot <vivien.didelot@...il.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <jakub.kicinski@...ronome.com>,
        murali.policharla@...adcom.com,
        Stephen Hemminger <stephen@...workplumber.org>,
        Jiri Pirko <jiri@...nulli.us>,
        Jakub Kicinski <kuba@...nel.org>,
        netdev <netdev@...r.kernel.org>
Subject: Re: [PATCH v2 net-next 10/10] net: bridge: implement
 auto-normalization of MTU for hardware datapath

On 26/03/2020 13:35, Ido Schimmel wrote:
> On Thu, Mar 26, 2020 at 12:25:20PM +0200, Vladimir Oltean wrote:
>> Hi Ido,
>>
>> On Thu, 26 Mar 2020 at 12:17, Ido Schimmel <idosch@...sch.org> wrote:
>>>
>>> Hi Vladimir,
>>>
>>> On Wed, Mar 25, 2020 at 05:22:09PM +0200, Vladimir Oltean wrote:
>>>> From: Vladimir Oltean <vladimir.oltean@....com>
>>>>
>>>> In the initial attempt to add MTU configuration for DSA:
>>>>
>>>> https://patchwork.ozlabs.org/cover/1199868/
>>>>
>>>> Florian raised a concern about the bridge MTU normalization logic (when
>>>> you bridge an interface with MTU 9000 and one with MTU 1500). His
>>>> expectation was that the bridge would automatically change the MTU of
>>>> all its slave ports to the minimum MTU, if those slaves are part of the
>>>> same hardware bridge. However, it doesn't do that, and for good reason,
>>>> I think. What br_mtu_auto_adjust() does is it adjusts the MTU of the
>>>> bridge net device itself, and not that of any slave port.  If it were to
>>>> modify the MTU of the slave ports, the effect would be that the user
>>>> wouldn't be able to increase the MTU of any bridge slave port as long as
>>>> it was part of the bridge, which would be a bit annoying to say the
>>>> least.
>>>>
>>>> The idea behind this behavior is that normal termination from Linux over
>>>> the L2 forwarding domain described by DSA should happen over the bridge
>>>> net device, which _is_ properly limited by the minimum MTU. And
>>>> termination over individual slave device is possible even if those are
>>>> bridged. But that is not "forwarding", so there's no reason to do
>>>> normalization there, since only a single interface sees that packet.
>>>>
>>>> The real problem is with the offloaded data path, where of course, the
>>>> bridge net device MTU is ignored. So a packet received on an interface
>>>> with MTU 9000 would still be forwarded to an interface with MTU 1500.
>>>> And that is exactly what this patch is trying to prevent from happening.
>>>
>>> How is that different from the software data path where the CPU needs to
>>> forward the packet between port A with MTU X and port B with MTU X/2 ?
>>>
>>> I don't really understand what problem you are trying to solve here. It
>>> seems like the user did some misconfiguration and now you're introducing
>>> a policy to mitigate it? If so, it should be something the user can
>>> disable. It also seems like something that can be easily handled by a
>>> user space application. You get netlink notifications for all these
>>> operations.
>>>
>>
>> Actually I think the problem can be better understood if I explain
>> what the switches I'm dealing with look like.
>> None of them really has a 'MTU' register. They perform length-based
>> admission control on RX.
> 
> IIUC, by that you mean that these switches only perform length-based
> filtering on RX, but not on TX?
> 
>> At this moment in time I don't think anybody wants to introduce an MRU
>> knob in iproute2, so we're adjusting that maximum ingress length
>> through the MTU. But it becomes an inverted problem, since the 'MTU'
>> needs to be controlled for all possible sources of traffic that are
>> going to egress on this port, in order for the real MTU on the port
>> itself to be observed.
> 
> Looking at your example from the changelog:
> 
> ip link set dev sw0p0 master br0
> ip link set dev sw0p1 mtu 1400
> ip link set dev sw0p1 master br0
> 
> Without your patch, after these commands sw0p0 has an MTU of 1500 and
> sw0p1 has an MTU of 1400. Are you saying that a frame with a length of
> 1450 bytes received on sw0p0 will be able to egress sw0p1 (assuming it
> should be forwarded there)?
> 
> If so, then I think I understand the problem. However, I don't think
> such code belongs in the bridge driver as this restriction does not
> apply to all switches. Also, I think that having the kernel change MTU
> of port A following MTU change of port B is a bit surprising and not
> intuitive.
> 
> I think you should be more explicit about it. Did you consider listening
> to 'NETDEV_PRECHANGEMTU' notifications in relevant drivers and vetoing
> unsupported configurations with an appropriate extack message? If you
> can't veto (in order not to break user space), you can still emit an
> extack message.
> 

+1, this sounds more appropriate IMO