lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251215110424.ovvp3x6mgf765gzc@skbuf>
Date: Mon, 15 Dec 2025 13:04:24 +0200
From: Vladimir Oltean <vladimir.oltean@....com>
To: Jonas Gorski <jonas.gorski@...il.com>
Cc: netdev@...r.kernel.org, Andrew Lunn <andrew@...n.ch>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Ma Ke <make24@...as.ac.cn>, Simon Horman <horms@...nel.org>,
	Florian Fainelli <f.fainelli@...il.com>
Subject: Re: [PATCH net 1/2] net: dsa: properly keep track of conduit
 reference

On Mon, Dec 15, 2025 at 11:13:58AM +0100, Jonas Gorski wrote:
> Hi,
> 
> On Sun, Dec 14, 2025 at 7:25 PM Vladimir Oltean <vladimir.oltean@....com> wrote:
> >
> > Problem description
> > -------------------
> >
> > DSA has a mumbo-jumbo of reference handling of the conduit net device
> > and its kobject which, sadly, is just wrong and doesn't make sense.
> >
> > There are two distinct problems.
> >
> > 1. The OF path, which uses of_find_net_device_by_node(), never releases
> >    the elevated refcount on the conduit's kobject. Nominally, the OF and
> >    non-OF paths should have identical code paths, and it is already
> >    suspicious that dsa_dev_to_net_device() has a put_device() call which
> >    is missing in dsa_port_parse_of(), but we can actually even verify
> >    that an issue exists. With CONFIG_DEBUG_KOBJECT_RELEASE=y, if we run
> >    this command "before" and "after" applying this patch:
> >
> > (unbind the conduit driver for net device eno2)
> > echo 0000:00:00.2 > /sys/bus/pci/drivers/fsl_enetc/unbind
> >
> > we see these lines in the output diff which appear only with the patch
> > applied:
> >
> > kobject: 'eno2' (ffff002009a3a6b8): kobject_release, parent 0000000000000000 (delayed 1000)
> > kobject: '109' (ffff0020099d59a0): kobject_release, parent 0000000000000000 (delayed 1000)
> >
> > 2. After we find the conduit interface one way (OF) or another (non-OF),
> >    it can get unregistered at any time, and DSA remains with a long-lived,
> >    but in this case stale, cpu_dp->conduit pointer. Holding the net
> >    device's underlying kobject isn't actually of much help, it just
> >    prevents it from being freed (but we never need that kobject
> >    directly). What helps us to prevent the net device from being
> >    unregistered is the parallel netdev reference mechanism (dev_hold()
> >    and dev_put()).
> >
> > Actually we actually use that netdev tracker mechanism implicitly on
> > user ports since commit 2f1e8ea726e9 ("net: dsa: link interfaces with
> > the DSA master to get rid of lockdep warnings"), via netdev_upper_dev_link().
> > But time still passes at DSA switch probe time between the initial
> > of_find_net_device_by_node() code and the user port creation time, time
> > during which the conduit could unregister itself and DSA wouldn't know
> > about it.
> >
> > So we have to run of_find_net_device_by_node() under rtnl_lock() to
> > prevent that from happening, and release the lock only with the netdev
> > tracker having acquired the reference.
> >
> > Do we need to keep the reference until dsa_unregister_switch() /
> > dsa_switch_shutdown()?
> > 1: Maybe yes. A switch device will still be registered even if all user
> >    ports failed to probe, see commit 86f8b1c01a0a ("net: dsa: Do not
> >    make user port errors fatal"), and the cpu_dp->conduit pointers
> >    remain valid.  I haven't audited all call paths to see whether they
> >    will actually use the conduit in lack of any user port, but if they
> >    do, it seems safer to not rely on user ports for that reference.
> > 2. Definitely yes. We support changing the conduit which a user port is
> >    associated to, and we can get into a situation where we've moved all
> >    user ports away from a conduit, thus no longer hold any reference to
> >    it via the net device tracker. But we shouldn't let it go nonetheless
> >    - see the next change in relation to dsa_tree_find_first_conduit()
> >    and LAG conduits which disappear.
> >    We have to be prepared to return to the physical conduit, so the CPU
> >    port must explicitly keep another reference to it. This is also to
> >    say: the user ports and their CPU ports may not always keep a
> >    reference to the same conduit net device, and both are needed.
> >
> > As for the conduit's kobject for the /sys/class/net/ entry, we don't
> > care about it, we can release it as soon as we hold the net device
> > object itself.
> >
> > History and blame attribution
> > -----------------------------
> >
> > The code has been refactored so many times, it is very difficult to
> > follow and properly attribute a blame, but I'll try to make a short
> > history which I hope to be correct.
> >
> > We have two distinct probing paths:
> > - one for OF, introduced in 2016 in commit 83c0afaec7b7 ("net: dsa: Add
> >   new binding implementation")
> > - one for non-OF, introduced in 2017 in commit 71e0bbde0d88 ("net: dsa:
> >   Add support for platform data")
> >
> > These are both complete rewrites of the original probing paths (which
> > used struct dsa_switch_driver and other weird stuff, instead of regular
> > devices on their respective buses for register access, like MDIO, SPI,
> > I2C etc):
> > - one for OF, introduced in 2013 in commit 5e95329b701c ("dsa: add
> >   device tree bindings to register DSA switches")
> > - one for non-OF, introduced in 2015 in commit 91da11f870f0 ("net:
> >   Distributed Switch Architecture protocol support")
> >
> > except for tiny bits and pieces like dsa_dev_to_net_device() which were
> > seemingly carried over since the original commit, and used to this day.
> >
> > The point is that the original probing paths received a fix in 2015 in
> > the form of commit 679fb46c5785 ("net: dsa: Add missing master netdev
> > dev_put() calls"), but the fix never made it into the "new" (dsa2)
> > probing paths that can still be traced to today, and the fixed probing
> > path was later deleted in 2019 in commit 93e86b3bc842 ("net: dsa: Remove
> > legacy probing support").
> >
> > That is to say, the new probing paths were never quite correct in this
> > area.
> >
> > The existence of the legacy probing support which was deleted in 2019
> > explains why dsa_dev_to_net_device() returns a conduit with elevated
> > refcount (because it was supposed to be released during
> > dsa_remove_dst()). After the removal of the legacy code, the only user
> > of dsa_dev_to_net_device() calls dev_put(conduit) immediately after this
> > function returns. This pattern makes no sense today, and can only be
> > interpreted historically to understand why dev_hold() was there in the
> > first place.
> >
> > Change details
> > --------------
> >
> > Today we have a better netdev tracking infrastructure which we should
> > use. It belongs in common code (dsa_port_parse_cpu()), which shows that
> > the OF and non-OF code paths aren't actually so different.
> >
> > When dsa_port_parse_cpu() or any subsequent function during setup fails,
> > dsa_switch_release_ports() will be called. However, dsa_port_parse_cpu()
> > may fail prior to us assigning cpu_dp->conduit and bumping the reference.
> > So we have to test for the conduit being NULL prior to calling
> > netdev_put().
> >
> > There have still been so many transformations to the code since the
> > blamed commits (rename master -> conduit, commit 0650bf52b31f ("net:
> > dsa: be compatible with masters which unregister on shutdown")), that it
> > only makes sense to fix the code using the best methods available today
> > and see how it can be backported to stable later. I suspect the fix
> > cannot even be backported to kernels which lack dsa_switch_shutdown(),
> > and I suspect this is also maybe why the long-lived conduit reference
> > didn't make it into the new DSA probing paths at the time (problems
> > during shutdown).
> >
> > Because dsa_dev_to_net_device() has a single call site and has to be
> > changed anyway, the logic was just absorbed into the non-OF
> > dsa_port_parse().
> 
> Largely matches my observations.
> 
> >
> > Tested on the ocelot/felix switch and on dsa_loop, both on the NXP
> > LS1028A with CONFIG_DEBUG_KOBJECT_RELEASE=y.
> 
> Also tested on b53, but since it neither has multiple conduits nor LAG
> support (yet), the testing was very limited.
> 
> Two minor nits/comments though ...
> >
> > Reported-by: Ma Ke <make24@...as.ac.cn>
> > Closes: https://lore.kernel.org/netdev/20251214131204.4684-1-make24@iscas.ac.cn/
> > Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation")
> > Fixes: 71e0bbde0d88 ("net: dsa: Add support for platform data")
> > Signed-off-by: Vladimir Oltean <vladimir.oltean@....com>
> > ---
> >  include/net/dsa.h |  1 +
> >  net/dsa/dsa.c     | 53 +++++++++++++++++++++++++----------------------
> >  2 files changed, 29 insertions(+), 25 deletions(-)
> >
> > diff --git a/include/net/dsa.h b/include/net/dsa.h
> > index cced1a866757..6b2b5ed64ea4 100644
> > --- a/include/net/dsa.h
> > +++ b/include/net/dsa.h
> > @@ -302,6 +302,7 @@ struct dsa_port {
> >         struct devlink_port     devlink_port;
> >         struct phylink          *pl;
> >         struct phylink_config   pl_config;
> > +       netdevice_tracker       conduit_tracker;
> >         struct dsa_lag          *lag;
> >         struct net_device       *hsr_dev;
> >
> > diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
> > index a20efabe778f..ac7900113d2b 100644
> > --- a/net/dsa/dsa.c
> > +++ b/net/dsa/dsa.c
> > @@ -1221,6 +1221,7 @@ static int dsa_port_parse_cpu(struct dsa_port *dp, struct net_device *conduit,
> >                 dst->tag_ops = tag_ops;
> >         }
> >
> > +       netdev_hold(conduit, &dp->conduit_tracker, GFP_KERNEL);
> >         dp->conduit = conduit;
> >         dp->type = DSA_PORT_TYPE_CPU;
> >         dsa_port_set_tag_protocol(dp, dst->tag_ops);
> > @@ -1253,14 +1254,21 @@ static int dsa_port_parse_of(struct dsa_port *dp, struct device_node *dn)
> >         if (ethernet) {
> >                 struct net_device *conduit;
> >                 const char *user_protocol;
> > +               int err;
> >
> > +               rtnl_lock();
> >                 conduit = of_find_net_device_by_node(ethernet);
> >                 of_node_put(ethernet);
> > -               if (!conduit)
> > +               if (!conduit) {
> > +                       rtnl_unlock();
> >                         return -EPROBE_DEFER;
> > +               }
> >
> >                 user_protocol = of_get_property(dn, "dsa-tag-protocol", NULL);
> 
> Maybe move this to before the rtnl_lock()? Not sure if this makes a
> difference (avoiding a lookup for netdev not yet there vs avoiding a
> lookup under rtnl lock).

If keeping the rtnl mutex acquired for too long is a concern (dsa_port_parse_cpu()
also calls request_module(), so there's that), I guess I could also reconsider
moving the netdev_hold() call right after the successful of_find_net_device_by_node()
call, and release the rtnl mutex right away. It's a tradeoff and I initially
considered that, but the error handling would become slightly more complex
and there would be some duplicated logic between the OF and non-OF paths.

> > -               return dsa_port_parse_cpu(dp, conduit, user_protocol);
> > +               err = dsa_port_parse_cpu(dp, conduit, user_protocol);
> > +               put_device(&conduit->dev);
> > +               rtnl_unlock();
> > +               return err;
> >         }
> >
> >         if (link)
> > @@ -1393,37 +1401,27 @@ static struct device *dev_find_class(struct device *parent, char *class)
> >         return device_find_child(parent, class, dev_is_class);
> >  }
> >
> > -static struct net_device *dsa_dev_to_net_device(struct device *dev)
> > -{
> > -       struct device *d;
> > -
> > -       d = dev_find_class(dev, "net");
> > -       if (d != NULL) {
> > -               struct net_device *nd;
> > -
> > -               nd = to_net_dev(d);
> > -               dev_hold(nd);
> > -               put_device(d);
> > -
> > -               return nd;
> > -       }
> > -
> > -       return NULL;
> > -}
> > -
> >  static int dsa_port_parse(struct dsa_port *dp, const char *name,
> >                           struct device *dev)
> >  {
> >         if (!strcmp(name, "cpu")) {
> >                 struct net_device *conduit;
> > +               struct device *d;
> > +               int err;
> >
> > -               conduit = dsa_dev_to_net_device(dev);
> > -               if (!conduit)
> > +               rtnl_lock();
> > +               d = dev_find_class(dev, "net");
> > +               if (!d) {
> > +                       rtnl_unlock();
> >                         return -EPROBE_DEFER;
> > +               }
> >
> > -               dev_put(conduit);
> > +               conduit = to_net_dev(d);
> >
> > -               return dsa_port_parse_cpu(dp, conduit, NULL);
> > +               err = dsa_port_parse_cpu(dp, conduit, NULL);
> > +               put_device(d);
> > +               rtnl_unlock();
> > +               return err;
> >         }
> >
> >         if (!strcmp(name, "dsa"))
> > @@ -1491,6 +1489,9 @@ static void dsa_switch_release_ports(struct dsa_switch *ds)
> >         struct dsa_vlan *v, *n;
> >
> >         dsa_switch_for_each_port_safe(dp, next, ds) {
> > +               if (dsa_port_is_cpu(dp) && dp->conduit)
> > +                       netdev_put(dp->conduit, &dp->conduit_tracker);
> > +
> >                 /* These are either entries that upper layers lost track of
> >                  * (probably due to bugs), or installed through interfaces
> >                  * where one does not necessarily have to remove them, like
> > @@ -1635,8 +1636,10 @@ void dsa_switch_shutdown(struct dsa_switch *ds)
> >         /* Disconnect from further netdevice notifiers on the conduit,
> >          * since netdev_uses_dsa() will now return false.
> >          */
> > -       dsa_switch_for_each_cpu_port(dp, ds)
> > +       dsa_switch_for_each_cpu_port(dp, ds) {
> > +               netdev_put(dp->conduit, &dp->conduit_tracker);
> >                 dp->conduit->dsa_ptr = NULL;
> 
> We should probably call netdev_put() only after clearing
> dp->conduit->ds_ptr, not before.

Ok.

> With that addressed,
> 
> Reviewed-by: Jonas Gorski <jonas.gorski@...il.com>

Thanks for the feedback.

> 
> > +       }
> >
> >         rtnl_unlock();
> >  out:
> > --
> > 2.43.0
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ