[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOiHx==MmOsF4aeBf8zbEaHgN59Q4b=FMUEvLdJC9xXfFa5FLA@mail.gmail.com>
Date: Mon, 15 Dec 2025 11:13:58 +0100
From: Jonas Gorski <jonas.gorski@...il.com>
To: Vladimir Oltean <vladimir.oltean@....com>
Cc: netdev@...r.kernel.org, Andrew Lunn <andrew@...n.ch>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Ma Ke <make24@...as.ac.cn>,
Simon Horman <horms@...nel.org>, Florian Fainelli <f.fainelli@...il.com>
Subject: Re: [PATCH net 1/2] net: dsa: properly keep track of conduit reference
Hi,
On Sun, Dec 14, 2025 at 7:25 PM Vladimir Oltean <vladimir.oltean@....com> wrote:
>
> Problem description
> -------------------
>
> DSA has a mumbo-jumbo of reference handling of the conduit net device
> and its kobject which, sadly, is just wrong and doesn't make sense.
>
> There are two distinct problems.
>
> 1. The OF path, which uses of_find_net_device_by_node(), never releases
> the elevated refcount on the conduit's kobject. Nominally, the OF and
> non-OF paths should have identical code paths, and it is already
> suspicious that dsa_dev_to_net_device() has a put_device() call which
> is missing in dsa_port_parse_of(), but we can actually even verify
> that an issue exists. With CONFIG_DEBUG_KOBJECT_RELEASE=y, if we run
> this command "before" and "after" applying this patch:
>
> (unbind the conduit driver for net device eno2)
> echo 0000:00:00.2 > /sys/bus/pci/drivers/fsl_enetc/unbind
>
> we see these lines in the output diff which appear only with the patch
> applied:
>
> kobject: 'eno2' (ffff002009a3a6b8): kobject_release, parent 0000000000000000 (delayed 1000)
> kobject: '109' (ffff0020099d59a0): kobject_release, parent 0000000000000000 (delayed 1000)
>
> 2. After we find the conduit interface one way (OF) or another (non-OF),
> it can get unregistered at any time, and DSA remains with a long-lived,
> but in this case stale, cpu_dp->conduit pointer. Holding the net
> device's underlying kobject isn't actually of much help, it just
> prevents it from being freed (but we never need that kobject
> directly). What helps us to prevent the net device from being
> unregistered is the parallel netdev reference mechanism (dev_hold()
> and dev_put()).
>
> Actually we actually use that netdev tracker mechanism implicitly on
> user ports since commit 2f1e8ea726e9 ("net: dsa: link interfaces with
> the DSA master to get rid of lockdep warnings"), via netdev_upper_dev_link().
> But time still passes at DSA switch probe time between the initial
> of_find_net_device_by_node() code and the user port creation time, time
> during which the conduit could unregister itself and DSA wouldn't know
> about it.
>
> So we have to run of_find_net_device_by_node() under rtnl_lock() to
> prevent that from happening, and release the lock only with the netdev
> tracker having acquired the reference.
>
> Do we need to keep the reference until dsa_unregister_switch() /
> dsa_switch_shutdown()?
> 1: Maybe yes. A switch device will still be registered even if all user
> ports failed to probe, see commit 86f8b1c01a0a ("net: dsa: Do not
> make user port errors fatal"), and the cpu_dp->conduit pointers
> remain valid. I haven't audited all call paths to see whether they
> will actually use the conduit in lack of any user port, but if they
> do, it seems safer to not rely on user ports for that reference.
> 2. Definitely yes. We support changing the conduit which a user port is
> associated to, and we can get into a situation where we've moved all
> user ports away from a conduit, thus no longer hold any reference to
> it via the net device tracker. But we shouldn't let it go nonetheless
> - see the next change in relation to dsa_tree_find_first_conduit()
> and LAG conduits which disappear.
> We have to be prepared to return to the physical conduit, so the CPU
> port must explicitly keep another reference to it. This is also to
> say: the user ports and their CPU ports may not always keep a
> reference to the same conduit net device, and both are needed.
>
> As for the conduit's kobject for the /sys/class/net/ entry, we don't
> care about it, we can release it as soon as we hold the net device
> object itself.
>
> History and blame attribution
> -----------------------------
>
> The code has been refactored so many times, it is very difficult to
> follow and properly attribute a blame, but I'll try to make a short
> history which I hope to be correct.
>
> We have two distinct probing paths:
> - one for OF, introduced in 2016 in commit 83c0afaec7b7 ("net: dsa: Add
> new binding implementation")
> - one for non-OF, introduced in 2017 in commit 71e0bbde0d88 ("net: dsa:
> Add support for platform data")
>
> These are both complete rewrites of the original probing paths (which
> used struct dsa_switch_driver and other weird stuff, instead of regular
> devices on their respective buses for register access, like MDIO, SPI,
> I2C etc):
> - one for OF, introduced in 2013 in commit 5e95329b701c ("dsa: add
> device tree bindings to register DSA switches")
> - one for non-OF, introduced in 2015 in commit 91da11f870f0 ("net:
> Distributed Switch Architecture protocol support")
>
> except for tiny bits and pieces like dsa_dev_to_net_device() which were
> seemingly carried over since the original commit, and used to this day.
>
> The point is that the original probing paths received a fix in 2015 in
> the form of commit 679fb46c5785 ("net: dsa: Add missing master netdev
> dev_put() calls"), but the fix never made it into the "new" (dsa2)
> probing paths that can still be traced to today, and the fixed probing
> path was later deleted in 2019 in commit 93e86b3bc842 ("net: dsa: Remove
> legacy probing support").
>
> That is to say, the new probing paths were never quite correct in this
> area.
>
> The existence of the legacy probing support which was deleted in 2019
> explains why dsa_dev_to_net_device() returns a conduit with elevated
> refcount (because it was supposed to be released during
> dsa_remove_dst()). After the removal of the legacy code, the only user
> of dsa_dev_to_net_device() calls dev_put(conduit) immediately after this
> function returns. This pattern makes no sense today, and can only be
> interpreted historically to understand why dev_hold() was there in the
> first place.
>
> Change details
> --------------
>
> Today we have a better netdev tracking infrastructure which we should
> use. It belongs in common code (dsa_port_parse_cpu()), which shows that
> the OF and non-OF code paths aren't actually so different.
>
> When dsa_port_parse_cpu() or any subsequent function during setup fails,
> dsa_switch_release_ports() will be called. However, dsa_port_parse_cpu()
> may fail prior to us assigning cpu_dp->conduit and bumping the reference.
> So we have to test for the conduit being NULL prior to calling
> netdev_put().
>
> There have still been so many transformations to the code since the
> blamed commits (rename master -> conduit, commit 0650bf52b31f ("net:
> dsa: be compatible with masters which unregister on shutdown")), that it
> only makes sense to fix the code using the best methods available today
> and see how it can be backported to stable later. I suspect the fix
> cannot even be backported to kernels which lack dsa_switch_shutdown(),
> and I suspect this is also maybe why the long-lived conduit reference
> didn't make it into the new DSA probing paths at the time (problems
> during shutdown).
>
> Because dsa_dev_to_net_device() has a single call site and has to be
> changed anyway, the logic was just absorbed into the non-OF
> dsa_port_parse().
Largely matches my observations.
>
> Tested on the ocelot/felix switch and on dsa_loop, both on the NXP
> LS1028A with CONFIG_DEBUG_KOBJECT_RELEASE=y.
Also tested on b53, but since it neither has multiple conduits nor LAG
support (yet), the testing was very limited.
Two minor nits/comments though ...
>
> Reported-by: Ma Ke <make24@...as.ac.cn>
> Closes: https://lore.kernel.org/netdev/20251214131204.4684-1-make24@iscas.ac.cn/
> Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation")
> Fixes: 71e0bbde0d88 ("net: dsa: Add support for platform data")
> Signed-off-by: Vladimir Oltean <vladimir.oltean@....com>
> ---
> include/net/dsa.h | 1 +
> net/dsa/dsa.c | 53 +++++++++++++++++++++++++----------------------
> 2 files changed, 29 insertions(+), 25 deletions(-)
>
> diff --git a/include/net/dsa.h b/include/net/dsa.h
> index cced1a866757..6b2b5ed64ea4 100644
> --- a/include/net/dsa.h
> +++ b/include/net/dsa.h
> @@ -302,6 +302,7 @@ struct dsa_port {
> struct devlink_port devlink_port;
> struct phylink *pl;
> struct phylink_config pl_config;
> + netdevice_tracker conduit_tracker;
> struct dsa_lag *lag;
> struct net_device *hsr_dev;
>
> diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
> index a20efabe778f..ac7900113d2b 100644
> --- a/net/dsa/dsa.c
> +++ b/net/dsa/dsa.c
> @@ -1221,6 +1221,7 @@ static int dsa_port_parse_cpu(struct dsa_port *dp, struct net_device *conduit,
> dst->tag_ops = tag_ops;
> }
>
> + netdev_hold(conduit, &dp->conduit_tracker, GFP_KERNEL);
> dp->conduit = conduit;
> dp->type = DSA_PORT_TYPE_CPU;
> dsa_port_set_tag_protocol(dp, dst->tag_ops);
> @@ -1253,14 +1254,21 @@ static int dsa_port_parse_of(struct dsa_port *dp, struct device_node *dn)
> if (ethernet) {
> struct net_device *conduit;
> const char *user_protocol;
> + int err;
>
> + rtnl_lock();
> conduit = of_find_net_device_by_node(ethernet);
> of_node_put(ethernet);
> - if (!conduit)
> + if (!conduit) {
> + rtnl_unlock();
> return -EPROBE_DEFER;
> + }
>
> user_protocol = of_get_property(dn, "dsa-tag-protocol", NULL);
Maybe move this to before the rtnl_lock()? Not sure if this makes a
difference (avoiding a lookup for netdev not yet there vs avoiding a
lookup under rtnl lock).
> - return dsa_port_parse_cpu(dp, conduit, user_protocol);
> + err = dsa_port_parse_cpu(dp, conduit, user_protocol);
> + put_device(&conduit->dev);
> + rtnl_unlock();
> + return err;
> }
>
> if (link)
> @@ -1393,37 +1401,27 @@ static struct device *dev_find_class(struct device *parent, char *class)
> return device_find_child(parent, class, dev_is_class);
> }
>
> -static struct net_device *dsa_dev_to_net_device(struct device *dev)
> -{
> - struct device *d;
> -
> - d = dev_find_class(dev, "net");
> - if (d != NULL) {
> - struct net_device *nd;
> -
> - nd = to_net_dev(d);
> - dev_hold(nd);
> - put_device(d);
> -
> - return nd;
> - }
> -
> - return NULL;
> -}
> -
> static int dsa_port_parse(struct dsa_port *dp, const char *name,
> struct device *dev)
> {
> if (!strcmp(name, "cpu")) {
> struct net_device *conduit;
> + struct device *d;
> + int err;
>
> - conduit = dsa_dev_to_net_device(dev);
> - if (!conduit)
> + rtnl_lock();
> + d = dev_find_class(dev, "net");
> + if (!d) {
> + rtnl_unlock();
> return -EPROBE_DEFER;
> + }
>
> - dev_put(conduit);
> + conduit = to_net_dev(d);
>
> - return dsa_port_parse_cpu(dp, conduit, NULL);
> + err = dsa_port_parse_cpu(dp, conduit, NULL);
> + put_device(d);
> + rtnl_unlock();
> + return err;
> }
>
> if (!strcmp(name, "dsa"))
> @@ -1491,6 +1489,9 @@ static void dsa_switch_release_ports(struct dsa_switch *ds)
> struct dsa_vlan *v, *n;
>
> dsa_switch_for_each_port_safe(dp, next, ds) {
> + if (dsa_port_is_cpu(dp) && dp->conduit)
> + netdev_put(dp->conduit, &dp->conduit_tracker);
> +
> /* These are either entries that upper layers lost track of
> * (probably due to bugs), or installed through interfaces
> * where one does not necessarily have to remove them, like
> @@ -1635,8 +1636,10 @@ void dsa_switch_shutdown(struct dsa_switch *ds)
> /* Disconnect from further netdevice notifiers on the conduit,
> * since netdev_uses_dsa() will now return false.
> */
> - dsa_switch_for_each_cpu_port(dp, ds)
> + dsa_switch_for_each_cpu_port(dp, ds) {
> + netdev_put(dp->conduit, &dp->conduit_tracker);
> dp->conduit->dsa_ptr = NULL;
We should probably call netdev_put() only after clearing
dp->conduit->ds_ptr, not before.
With that addressed,
Reviewed-by: Jonas Gorski <jonas.gorski@...il.com>
> + }
>
> rtnl_unlock();
> out:
> --
> 2.43.0
>
Powered by blists - more mailing lists