[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251215110424.ovvp3x6mgf765gzc@skbuf>
Date: Mon, 15 Dec 2025 13:04:24 +0200
From: Vladimir Oltean <vladimir.oltean@....com>
To: Jonas Gorski <jonas.gorski@...il.com>
Cc: netdev@...r.kernel.org, Andrew Lunn <andrew@...n.ch>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Ma Ke <make24@...as.ac.cn>, Simon Horman <horms@...nel.org>,
Florian Fainelli <f.fainelli@...il.com>
Subject: Re: [PATCH net 1/2] net: dsa: properly keep track of conduit
reference
On Mon, Dec 15, 2025 at 11:13:58AM +0100, Jonas Gorski wrote:
> Hi,
>
> On Sun, Dec 14, 2025 at 7:25 PM Vladimir Oltean <vladimir.oltean@....com> wrote:
> >
> > Problem description
> > -------------------
> >
> > DSA has a mumbo-jumbo of reference handling of the conduit net device
> > and its kobject which, sadly, is just wrong and doesn't make sense.
> >
> > There are two distinct problems.
> >
> > 1. The OF path, which uses of_find_net_device_by_node(), never releases
> > the elevated refcount on the conduit's kobject. Nominally, the OF and
> > non-OF paths should have identical code paths, and it is already
> > suspicious that dsa_dev_to_net_device() has a put_device() call which
> > is missing in dsa_port_parse_of(), but we can actually even verify
> > that an issue exists. With CONFIG_DEBUG_KOBJECT_RELEASE=y, if we run
> > this command "before" and "after" applying this patch:
> >
> > (unbind the conduit driver for net device eno2)
> > echo 0000:00:00.2 > /sys/bus/pci/drivers/fsl_enetc/unbind
> >
> > we see these lines in the output diff which appear only with the patch
> > applied:
> >
> > kobject: 'eno2' (ffff002009a3a6b8): kobject_release, parent 0000000000000000 (delayed 1000)
> > kobject: '109' (ffff0020099d59a0): kobject_release, parent 0000000000000000 (delayed 1000)
> >
> > 2. After we find the conduit interface one way (OF) or another (non-OF),
> > it can get unregistered at any time, and DSA remains with a long-lived,
> > but in this case stale, cpu_dp->conduit pointer. Holding the net
> > device's underlying kobject isn't actually of much help, it just
> > prevents it from being freed (but we never need that kobject
> > directly). What helps us to prevent the net device from being
> > unregistered is the parallel netdev reference mechanism (dev_hold()
> > and dev_put()).
> >
> > Actually we actually use that netdev tracker mechanism implicitly on
> > user ports since commit 2f1e8ea726e9 ("net: dsa: link interfaces with
> > the DSA master to get rid of lockdep warnings"), via netdev_upper_dev_link().
> > But time still passes at DSA switch probe time between the initial
> > of_find_net_device_by_node() code and the user port creation time, time
> > during which the conduit could unregister itself and DSA wouldn't know
> > about it.
> >
> > So we have to run of_find_net_device_by_node() under rtnl_lock() to
> > prevent that from happening, and release the lock only with the netdev
> > tracker having acquired the reference.
> >
> > Do we need to keep the reference until dsa_unregister_switch() /
> > dsa_switch_shutdown()?
> > 1: Maybe yes. A switch device will still be registered even if all user
> > ports failed to probe, see commit 86f8b1c01a0a ("net: dsa: Do not
> > make user port errors fatal"), and the cpu_dp->conduit pointers
> > remain valid. I haven't audited all call paths to see whether they
> > will actually use the conduit in lack of any user port, but if they
> > do, it seems safer to not rely on user ports for that reference.
> > 2. Definitely yes. We support changing the conduit which a user port is
> > associated to, and we can get into a situation where we've moved all
> > user ports away from a conduit, thus no longer hold any reference to
> > it via the net device tracker. But we shouldn't let it go nonetheless
> > - see the next change in relation to dsa_tree_find_first_conduit()
> > and LAG conduits which disappear.
> > We have to be prepared to return to the physical conduit, so the CPU
> > port must explicitly keep another reference to it. This is also to
> > say: the user ports and their CPU ports may not always keep a
> > reference to the same conduit net device, and both are needed.
> >
> > As for the conduit's kobject for the /sys/class/net/ entry, we don't
> > care about it, we can release it as soon as we hold the net device
> > object itself.
> >
> > History and blame attribution
> > -----------------------------
> >
> > The code has been refactored so many times, it is very difficult to
> > follow and properly attribute a blame, but I'll try to make a short
> > history which I hope to be correct.
> >
> > We have two distinct probing paths:
> > - one for OF, introduced in 2016 in commit 83c0afaec7b7 ("net: dsa: Add
> > new binding implementation")
> > - one for non-OF, introduced in 2017 in commit 71e0bbde0d88 ("net: dsa:
> > Add support for platform data")
> >
> > These are both complete rewrites of the original probing paths (which
> > used struct dsa_switch_driver and other weird stuff, instead of regular
> > devices on their respective buses for register access, like MDIO, SPI,
> > I2C etc):
> > - one for OF, introduced in 2013 in commit 5e95329b701c ("dsa: add
> > device tree bindings to register DSA switches")
> > - one for non-OF, introduced in 2015 in commit 91da11f870f0 ("net:
> > Distributed Switch Architecture protocol support")
> >
> > except for tiny bits and pieces like dsa_dev_to_net_device() which were
> > seemingly carried over since the original commit, and used to this day.
> >
> > The point is that the original probing paths received a fix in 2015 in
> > the form of commit 679fb46c5785 ("net: dsa: Add missing master netdev
> > dev_put() calls"), but the fix never made it into the "new" (dsa2)
> > probing paths that can still be traced to today, and the fixed probing
> > path was later deleted in 2019 in commit 93e86b3bc842 ("net: dsa: Remove
> > legacy probing support").
> >
> > That is to say, the new probing paths were never quite correct in this
> > area.
> >
> > The existence of the legacy probing support which was deleted in 2019
> > explains why dsa_dev_to_net_device() returns a conduit with elevated
> > refcount (because it was supposed to be released during
> > dsa_remove_dst()). After the removal of the legacy code, the only user
> > of dsa_dev_to_net_device() calls dev_put(conduit) immediately after this
> > function returns. This pattern makes no sense today, and can only be
> > interpreted historically to understand why dev_hold() was there in the
> > first place.
> >
> > Change details
> > --------------
> >
> > Today we have a better netdev tracking infrastructure which we should
> > use. It belongs in common code (dsa_port_parse_cpu()), which shows that
> > the OF and non-OF code paths aren't actually so different.
> >
> > When dsa_port_parse_cpu() or any subsequent function during setup fails,
> > dsa_switch_release_ports() will be called. However, dsa_port_parse_cpu()
> > may fail prior to us assigning cpu_dp->conduit and bumping the reference.
> > So we have to test for the conduit being NULL prior to calling
> > netdev_put().
> >
> > There have still been so many transformations to the code since the
> > blamed commits (rename master -> conduit, commit 0650bf52b31f ("net:
> > dsa: be compatible with masters which unregister on shutdown")), that it
> > only makes sense to fix the code using the best methods available today
> > and see how it can be backported to stable later. I suspect the fix
> > cannot even be backported to kernels which lack dsa_switch_shutdown(),
> > and I suspect this is also maybe why the long-lived conduit reference
> > didn't make it into the new DSA probing paths at the time (problems
> > during shutdown).
> >
> > Because dsa_dev_to_net_device() has a single call site and has to be
> > changed anyway, the logic was just absorbed into the non-OF
> > dsa_port_parse().
>
> Largely matches my observations.
>
> >
> > Tested on the ocelot/felix switch and on dsa_loop, both on the NXP
> > LS1028A with CONFIG_DEBUG_KOBJECT_RELEASE=y.
>
> Also tested on b53, but since it neither has multiple conduits nor LAG
> support (yet), the testing was very limited.
>
> Two minor nits/comments though ...
> >
> > Reported-by: Ma Ke <make24@...as.ac.cn>
> > Closes: https://lore.kernel.org/netdev/20251214131204.4684-1-make24@iscas.ac.cn/
> > Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation")
> > Fixes: 71e0bbde0d88 ("net: dsa: Add support for platform data")
> > Signed-off-by: Vladimir Oltean <vladimir.oltean@....com>
> > ---
> > include/net/dsa.h | 1 +
> > net/dsa/dsa.c | 53 +++++++++++++++++++++++++----------------------
> > 2 files changed, 29 insertions(+), 25 deletions(-)
> >
> > diff --git a/include/net/dsa.h b/include/net/dsa.h
> > index cced1a866757..6b2b5ed64ea4 100644
> > --- a/include/net/dsa.h
> > +++ b/include/net/dsa.h
> > @@ -302,6 +302,7 @@ struct dsa_port {
> > struct devlink_port devlink_port;
> > struct phylink *pl;
> > struct phylink_config pl_config;
> > + netdevice_tracker conduit_tracker;
> > struct dsa_lag *lag;
> > struct net_device *hsr_dev;
> >
> > diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
> > index a20efabe778f..ac7900113d2b 100644
> > --- a/net/dsa/dsa.c
> > +++ b/net/dsa/dsa.c
> > @@ -1221,6 +1221,7 @@ static int dsa_port_parse_cpu(struct dsa_port *dp, struct net_device *conduit,
> > dst->tag_ops = tag_ops;
> > }
> >
> > + netdev_hold(conduit, &dp->conduit_tracker, GFP_KERNEL);
> > dp->conduit = conduit;
> > dp->type = DSA_PORT_TYPE_CPU;
> > dsa_port_set_tag_protocol(dp, dst->tag_ops);
> > @@ -1253,14 +1254,21 @@ static int dsa_port_parse_of(struct dsa_port *dp, struct device_node *dn)
> > if (ethernet) {
> > struct net_device *conduit;
> > const char *user_protocol;
> > + int err;
> >
> > + rtnl_lock();
> > conduit = of_find_net_device_by_node(ethernet);
> > of_node_put(ethernet);
> > - if (!conduit)
> > + if (!conduit) {
> > + rtnl_unlock();
> > return -EPROBE_DEFER;
> > + }
> >
> > user_protocol = of_get_property(dn, "dsa-tag-protocol", NULL);
>
> Maybe move this to before the rtnl_lock()? Not sure if this makes a
> difference (avoiding a lookup for netdev not yet there vs avoiding a
> lookup under rtnl lock).
If keeping the rtnl mutex acquired for too long is a concern (dsa_port_parse_cpu()
also calls request_module(), so there's that), I guess I could also reconsider
moving the netdev_hold() call right after the successful of_find_net_device_by_node()
call, and release the rtnl mutex right away. It's a tradeoff and I initially
considered that, but the error handling would become slightly more complex
and there would be some duplicated logic between the OF and non-OF paths.
> > - return dsa_port_parse_cpu(dp, conduit, user_protocol);
> > + err = dsa_port_parse_cpu(dp, conduit, user_protocol);
> > + put_device(&conduit->dev);
> > + rtnl_unlock();
> > + return err;
> > }
> >
> > if (link)
> > @@ -1393,37 +1401,27 @@ static struct device *dev_find_class(struct device *parent, char *class)
> > return device_find_child(parent, class, dev_is_class);
> > }
> >
> > -static struct net_device *dsa_dev_to_net_device(struct device *dev)
> > -{
> > - struct device *d;
> > -
> > - d = dev_find_class(dev, "net");
> > - if (d != NULL) {
> > - struct net_device *nd;
> > -
> > - nd = to_net_dev(d);
> > - dev_hold(nd);
> > - put_device(d);
> > -
> > - return nd;
> > - }
> > -
> > - return NULL;
> > -}
> > -
> > static int dsa_port_parse(struct dsa_port *dp, const char *name,
> > struct device *dev)
> > {
> > if (!strcmp(name, "cpu")) {
> > struct net_device *conduit;
> > + struct device *d;
> > + int err;
> >
> > - conduit = dsa_dev_to_net_device(dev);
> > - if (!conduit)
> > + rtnl_lock();
> > + d = dev_find_class(dev, "net");
> > + if (!d) {
> > + rtnl_unlock();
> > return -EPROBE_DEFER;
> > + }
> >
> > - dev_put(conduit);
> > + conduit = to_net_dev(d);
> >
> > - return dsa_port_parse_cpu(dp, conduit, NULL);
> > + err = dsa_port_parse_cpu(dp, conduit, NULL);
> > + put_device(d);
> > + rtnl_unlock();
> > + return err;
> > }
> >
> > if (!strcmp(name, "dsa"))
> > @@ -1491,6 +1489,9 @@ static void dsa_switch_release_ports(struct dsa_switch *ds)
> > struct dsa_vlan *v, *n;
> >
> > dsa_switch_for_each_port_safe(dp, next, ds) {
> > + if (dsa_port_is_cpu(dp) && dp->conduit)
> > + netdev_put(dp->conduit, &dp->conduit_tracker);
> > +
> > /* These are either entries that upper layers lost track of
> > * (probably due to bugs), or installed through interfaces
> > * where one does not necessarily have to remove them, like
> > @@ -1635,8 +1636,10 @@ void dsa_switch_shutdown(struct dsa_switch *ds)
> > /* Disconnect from further netdevice notifiers on the conduit,
> > * since netdev_uses_dsa() will now return false.
> > */
> > - dsa_switch_for_each_cpu_port(dp, ds)
> > + dsa_switch_for_each_cpu_port(dp, ds) {
> > + netdev_put(dp->conduit, &dp->conduit_tracker);
> > dp->conduit->dsa_ptr = NULL;
>
> We should probably call netdev_put() only after clearing
> dp->conduit->ds_ptr, not before.
Ok.
> With that addressed,
>
> Reviewed-by: Jonas Gorski <jonas.gorski@...il.com>
Thanks for the feedback.
>
> > + }
> >
> > rtnl_unlock();
> > out:
> > --
> > 2.43.0
> >
Powered by blists - more mailing lists