[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240430003108.4dyjlavsledkbvot@skbuf>
Date: Tue, 30 Apr 2024 03:31:08 +0300
From: Vladimir Oltean <olteanv@...il.com>
To: Marek Behún <kabel@...nel.org>
Cc: netdev@...r.kernel.org, Andrew Lunn <andrew@...n.ch>,
Florian Fainelli <f.fainelli@...il.com>
Subject: Re: [PATCH net-next 2/2] net: dsa: update the unicast MAC address
when changing conduit
Hi Marek,
On Mon, Apr 29, 2024 at 06:36:27PM +0200, Marek Behún wrote:
> DSA exhibits different behavior regarding the primary unicast MAC
> address stored in port standalone FDB and the conduit device UC
> database while the interface is down vs up.
>
> If we put a switch port down while changing the conduit with
> ip link set sw0p0 down
> ip link set sw0p0 type dsa conduit conduit1
> ip link set sw0p0 up
> we delete the address in dsa_user_close() and install the (possibly
> different) address dsa_user_open().
>
> But when changing the conduit on the fly, the old address is not
> deleted and the new one is not installed.
>
> Since we explicitly want to support live-changing the conduit, uninstall
> the old address before the dsa_port_change_conduit() call and install
> the (possibly different) new one afterwards.
>
> Because the dsa_user_change_conduit() call tries to smoothly restore the
> old conduit if anything fails while setting new one (except the MTU
> change), this leaves us with the question about what to do if the
> installation of the new address fails. Since we have already deleted the
> old address, we can expect that restoring the old address would also fail,
> and thus we can't revert the conduit change correctly. I have therefore
> decided to treat it as a fatal error printed into the kernel log.
>
> Fixes: 95f510d0b792 ("net: dsa: allow the DSA master to be seen and changed through rtnetlink")
> Signed-off-by: Marek Behún <kabel@...nel.org>
> ---
It's good to see you returning to the "multiple CPU ports" topic.
This is a good catch, though it's quite an interesting thing why I
haven't noticed this during my own testing. Especially when the platform
I tested has dsa_switch_supports_uc_filtering() == true, so it _has_ to
install the host addresses correctly, because DSA then disables host
flooding and not even ping would work.
I _suspect_ it might be because I only tested the live migration when
the port is under a bridge, and in that case, the user port MAC address
also exists in the bridge FDB database as a BR_FDB_LOCAL entry, which
_is_ replayed towards the new conduit. And when I did test standalone
ports mode, it must have been only with a "cold" change of conduits.
Anyway, logically the change makes perfect sense, though I would like to
try and test it tomorrow (I need to rebuild the setup unfortunately).
Just wondering, why didn't you do the dev->dev_addr migration as part of
dsa_port_change_conduit() where the rest of the object migration is,
near or even as part of dsa_user_sync_ha()?
> net/dsa/user.c | 45 +++++++++++++++++++++++++++++++++++++--------
> 1 file changed, 37 insertions(+), 8 deletions(-)
>
> diff --git a/net/dsa/user.c b/net/dsa/user.c
> index b1d8d1827f91..70d7be1b6a79 100644
> --- a/net/dsa/user.c
> +++ b/net/dsa/user.c
> @@ -2767,9 +2767,37 @@ int dsa_user_change_conduit(struct net_device *dev, struct net_device *conduit,
> if (err)
> goto out_revert_old_conduit_unlink;
>
> + /* If live-changing, we also need to uninstall the user device address
> + * from the port FDB and the conduit interface. This has to be done
> + * before the conduit is changed.
> + */
> + if (dev->flags & IFF_UP)
> + dsa_user_host_uc_uninstall(dev);
> +
> err = dsa_port_change_conduit(dp, conduit, extack);
> if (err)
> - goto out_revert_conduit_link;
> + goto out_revert_host_address;
> +
> + /* If the port doesn't have its own MAC address and relies on the DSA
> + * conduit's one, inherit it again from the new DSA conduit.
> + */
> + if (is_zero_ether_addr(dp->mac))
> + eth_hw_addr_inherit(dev, conduit);
> +
> + /* If live-changing, we need to install the user device address to the
> + * port FDB and the conduit interface. Since the device address needs to
> + * be installed towards the new conduit in the port FDB, this needs to
> + * be done after the conduit is changed.
> + */
> + if (dev->flags & IFF_UP) {
> + err = dsa_user_host_uc_install(dev, dev->dev_addr);
> + if (err) {
> + netdev_err(dev,
> + "fatal error installing new host address: %pe\n",
> + ERR_PTR(err));
> + return err;
Even though there are still things that the user can try to do if this
fails (like putting the conduit in promiscuous mode, and limp on in a
degraded state), I do agree with error checking, to not give the user
process the false impression that all is well.
However, this is treated way too fatally here (so as to "return err" without
even attempting to do a rewind), when in reality it could be recoverable.
When moving the logic to dsa_port_change_conduit() please integrate with
the existing rewind flow.
Keep in mind that the RX filtering database of the switch or the conduit
may be limited in size, and may really run out. For that reason, your
dsa_user_host_uc_install() call should be placed _before_ the
dsa_user_sync_ha() logic that syncs the uc/mc secondary address lists.
Those are unchecked-for errors (partly because it's very hard to do: you
need to synchronize a deferred work context with a process context), and
they could easily fill up the filtering tables of the conduit. So let's
prioritize the (single) standalone MAC address of the user port.
Powered by blists - more mailing lists