netdev - Re: [PATCH net 3/3] net: dsa: Include tagger overhead when setting MTU for DSA and CPU ports

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20210525091016.sxxabqfjac6f3dhg@skbuf>
Date:   Tue, 25 May 2021 09:10:16 +0000
From:   Vladimir Oltean <vladimir.oltean@....com>
To:     Andrew Lunn <andrew@...n.ch>
CC:     David Miller <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        netdev <netdev@...r.kernel.org>,
        Florian Fainelli <f.fainelli@...il.com>,
        "cao88yu@...il.com" <cao88yu@...il.com>
Subject: Re: [PATCH net 3/3] net: dsa: Include tagger overhead when setting
 MTU for DSA and CPU ports

On Tue, May 25, 2021 at 04:53:39AM +0200, Andrew Lunn wrote:
> On Mon, May 24, 2021 at 10:04:01PM +0000, Vladimir Oltean wrote:
> > On Mon, May 24, 2021 at 11:33:13PM +0200, Andrew Lunn wrote:
> > > Same members of the Marvell Ethernet switches impose MTU restrictions
> > > on ports used for connecting to the CPU or DSA. If the MTU is set too
> > > low, tagged frames will be discarded. Ensure the tagger overhead is
> > > included in setting the MTU for DSA and CPU ports.
> > > 
> > > Fixes: 1baf0fac10fb ("net: dsa: mv88e6xxx: Use chip-wide max frame size for MTU")
> > > Reported by: 曹煜 <cao88yu@...il.com>
> > > Signed-off-by: Andrew Lunn <andrew@...n.ch>
> > > ---
> > 
> > Some switches account for the DSA tag automatically in hardware. So far
> > it has been the convention that if a switch doesn't do that, the driver
> > should, not DSA.
> 
> O.K.
> 
> This is going to be a little bit interesting with Tobias's support for
> changing the tag protocol. I need to look at the ordering.

The dsa_switch_change_tag_proto() notifier handler already iterates
through user ports and calls dsa_slave_change_mtu(), which triggers the
whole shebang (calculates the largest_mtu of the ds, changes the master
MTU to that value plus the tagger overhead, emits a notifier for the CPU
port MTU change and another one for the user port MTU change, then
triggers the MTU bridge normalization logic if the MTU is in fact a
MRU).

So when the tagging protocol changes, you get re-notified to change the
MTU on the CPU port to the largest_mtu. That part should work correctly.

What could be interesting, and this is something I had to check, is to
see if the proper MTU values are propagated correctly to the DSA links.
dsa_slave_change_mtu() calls dsa_port_mtu_change() with
propagate_upstream == true for the CPU port (which is programmed with
the largest_mtu of the switch) and that triggers this matching logic:

static bool dsa_switch_mtu_match(struct dsa_switch *ds, int port,
				 struct dsa_notifier_mtu_info *info)
{
	if (ds->index == info->sw_index)
		return (port == info->port) || dsa_is_dsa_port(ds, port);

	if (!info->propagate_upstream)
		return false;

	if (dsa_is_dsa_port(ds, port) || dsa_is_cpu_port(ds, port)) <- returns true for all DSA links here
		return true;

	return false;
}

the "propagate_upstream" is a bit of a misnomer, since it is "propagate
to other switches" - what I really needed at the time, with
"propagate_upstream == false", was a way to send a targeted MTU change
(for the user port itself) that bypasses the cross-chip notifiers.

My updated cross-chip notifier simulator
(https://patchwork.kernel.org/project/netdevbpf/patch/20210222120248.1415075-1-olteanv@gmail.com/)
shows this "heat map" for a notifier emitted on the CPU port (port 0 of
switch 0) with propagate_upstream == true:

Heat map for test notifier emitted on sw0p0:

   sw0p0     sw0p1     sw0p2     sw0p3     sw0p4
[  cpu  ] [  user ] [  user ] [  dsa  ] [  user ]
[   x   ] [       ] [       ] [   x   ] [       ]
                                  |
                                  +---------+
                                            |
   sw1p0     sw1p1     sw1p2     sw1p3     sw1p4
[  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
[       ] [       ] [       ] [   x   ] [   x   ]
                                  |
                                  +---------+
                                            |
   sw2p0     sw2p1     sw2p2     sw2p3     sw2p4
[  user ] [  user ] [  user ] [  user ] [  dsa  ]
[       ] [       ] [       ] [       ] [   x   ]

So the largest_mtu in this case ends up being programmed in all DSA
links.

There are 2 potential problems I see:
(a) the largest_mtu is calculated as the maximum over all user ports of
    @ds. But since it is propagated in the entire tree, maybe it should
    be the maximum across the entire @dst, to avoid this situation:

ip link set sw0p1 mtu 9000
ip link set sw1p1 mtu 1500 # oops, this changes the largest_mtu of the CPU port, breaking termination for sw0p1

(b) I don't remember why I didn't make the targeted notifier
    (propagate_upstream == false) even more targeted towards only the
    port on which it was emitted. Instead DSA links of that switch are
    targeted too, and that is probably a mistake:

	if (ds->index == info->sw_index)
		return (port == info->port) || dsa_is_dsa_port(ds, port);

because if the DSA links of the entire dst were programmed in a previous
round to the largest_mtu via a "propagate_upstream == true" notification,
then the dsa_port_mtu_change(propagate_upstream == false) call that is
immediately upcoming will break the MTU on the one DSA link which is
chip-wise local to the dp whose MTU is changing right now.

Example:

ip link set sw0p1 mtu 9000
ip link set sw2p1 mtu 9000 # at this stage, sw0p1 and sw2p1 can talk to one another using jumbo frames
ip link set sw0p2 mtu 1500 # this programs the sw0p3 DSA link first to
                           # the largest_mtu of 9000, then reprograms it to 1500 with the
                           # "propagate_upstream == false" notifier, breaking communication between
                           # sw0p1 and sw2p1