[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20240430162231.27fbd03c@dellmb>
Date: Tue, 30 Apr 2024 16:22:31 +0200
From: Marek Behún <kabel@...nel.org>
To: Vladimir Oltean <olteanv@...il.com>
Cc: netdev@...r.kernel.org, Andrew Lunn <andrew@...n.ch>, Florian Fainelli
<f.fainelli@...il.com>
Subject: Re: [PATCH net-next 2/2] net: dsa: update the unicast MAC address
when changing conduit
On Tue, 30 Apr 2024 03:31:08 +0300
Vladimir Oltean <olteanv@...il.com> wrote:
> Hi Marek,
>
> On Mon, Apr 29, 2024 at 06:36:27PM +0200, Marek Behún wrote:
> > DSA exhibits different behavior regarding the primary unicast MAC
> > address stored in port standalone FDB and the conduit device UC
> > database while the interface is down vs up.
> >
> > If we put a switch port down while changing the conduit with
> > ip link set sw0p0 down
> > ip link set sw0p0 type dsa conduit conduit1
> > ip link set sw0p0 up
> > we delete the address in dsa_user_close() and install the (possibly
> > different) address dsa_user_open().
> >
> > But when changing the conduit on the fly, the old address is not
> > deleted and the new one is not installed.
> >
> > Since we explicitly want to support live-changing the conduit, uninstall
> > the old address before the dsa_port_change_conduit() call and install
> > the (possibly different) new one afterwards.
> >
> > Because the dsa_user_change_conduit() call tries to smoothly restore the
> > old conduit if anything fails while setting new one (except the MTU
> > change), this leaves us with the question about what to do if the
> > installation of the new address fails. Since we have already deleted the
> > old address, we can expect that restoring the old address would also fail,
> > and thus we can't revert the conduit change correctly. I have therefore
> > decided to treat it as a fatal error printed into the kernel log.
> >
> > Fixes: 95f510d0b792 ("net: dsa: allow the DSA master to be seen and changed through rtnetlink")
> > Signed-off-by: Marek Behún <kabel@...nel.org>
> > ---
>
> It's good to see you returning to the "multiple CPU ports" topic.
Didn't have time for this at all until now, unfortunately :-(
BTW, thank you for all your work on this. It must have been a lot of
time you spent on it...
> This is a good catch, though it's quite an interesting thing why I
> haven't noticed this during my own testing. Especially when the platform
> I tested has dsa_switch_supports_uc_filtering() == true, so it _has_ to
> install the host addresses correctly, because DSA then disables host
> flooding and not even ping would work.
>
> I _suspect_ it might be because I only tested the live migration when
> the port is under a bridge, and in that case, the user port MAC address
> also exists in the bridge FDB database as a BR_FDB_LOCAL entry, which
> _is_ replayed towards the new conduit. And when I did test standalone
> ports mode, it must have been only with a "cold" change of conduits.
>
> Anyway, logically the change makes perfect sense, though I would like to
> try and test it tomorrow (I need to rebuild the setup unfortunately).
>
> Just wondering, why didn't you do the dev->dev_addr migration as part of
> dsa_port_change_conduit() where the rest of the object migration is,
> near or even as part of dsa_user_sync_ha()?
Because I didn't think of it. Of course it makes much more sense...
>
> > net/dsa/user.c | 45 +++++++++++++++++++++++++++++++++++++--------
> > 1 file changed, 37 insertions(+), 8 deletions(-)
> >
> > diff --git a/net/dsa/user.c b/net/dsa/user.c
> > index b1d8d1827f91..70d7be1b6a79 100644
> > --- a/net/dsa/user.c
> > +++ b/net/dsa/user.c
> > @@ -2767,9 +2767,37 @@ int dsa_user_change_conduit(struct net_device *dev, struct net_device *conduit,
> > if (err)
> > goto out_revert_old_conduit_unlink;
> >
> > + /* If live-changing, we also need to uninstall the user device address
> > + * from the port FDB and the conduit interface. This has to be done
> > + * before the conduit is changed.
> > + */
> > + if (dev->flags & IFF_UP)
> > + dsa_user_host_uc_uninstall(dev);
> > +
> > err = dsa_port_change_conduit(dp, conduit, extack);
> > if (err)
> > - goto out_revert_conduit_link;
> > + goto out_revert_host_address;
> > +
> > + /* If the port doesn't have its own MAC address and relies on the DSA
> > + * conduit's one, inherit it again from the new DSA conduit.
> > + */
> > + if (is_zero_ether_addr(dp->mac))
> > + eth_hw_addr_inherit(dev, conduit);
> > +
> > + /* If live-changing, we need to install the user device address to the
> > + * port FDB and the conduit interface. Since the device address needs to
> > + * be installed towards the new conduit in the port FDB, this needs to
> > + * be done after the conduit is changed.
> > + */
> > + if (dev->flags & IFF_UP) {
> > + err = dsa_user_host_uc_install(dev, dev->dev_addr);
> > + if (err) {
> > + netdev_err(dev,
> > + "fatal error installing new host address: %pe\n",
> > + ERR_PTR(err));
> > + return err;
>
> Even though there are still things that the user can try to do if this
> fails (like putting the conduit in promiscuous mode, and limp on in a
> degraded state), I do agree with error checking, to not give the user
> process the false impression that all is well.
>
> However, this is treated way too fatally here (so as to "return err" without
> even attempting to do a rewind), when in reality it could be recoverable.
> When moving the logic to dsa_port_change_conduit() please integrate with
> the existing rewind flow.
>
> Keep in mind that the RX filtering database of the switch or the conduit
> may be limited in size, and may really run out. For that reason, your
> dsa_user_host_uc_install() call should be placed _before_ the
> dsa_user_sync_ha() logic that syncs the uc/mc secondary address lists.
> Those are unchecked-for errors (partly because it's very hard to do: you
> need to synchronize a deferred work context with a process context), and
> they could easily fill up the filtering tables of the conduit. So let's
> prioritize the (single) standalone MAC address of the user port.
>
OK, I'll send another version in ~2-3 days. In the meantime, have you
had time to test the change?
Marek
Powered by blists - more mailing lists