lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOiHx==MmOsF4aeBf8zbEaHgN59Q4b=FMUEvLdJC9xXfFa5FLA@mail.gmail.com>
Date: Mon, 15 Dec 2025 11:13:58 +0100
From: Jonas Gorski <jonas.gorski@...il.com>
To: Vladimir Oltean <vladimir.oltean@....com>
Cc: netdev@...r.kernel.org, Andrew Lunn <andrew@...n.ch>, 
	"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, 
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Ma Ke <make24@...as.ac.cn>, 
	Simon Horman <horms@...nel.org>, Florian Fainelli <f.fainelli@...il.com>
Subject: Re: [PATCH net 1/2] net: dsa: properly keep track of conduit reference

Hi,

On Sun, Dec 14, 2025 at 7:25 PM Vladimir Oltean <vladimir.oltean@....com> wrote:
>
> Problem description
> -------------------
>
> DSA has a mumbo-jumbo of reference handling of the conduit net device
> and its kobject which, sadly, is just wrong and doesn't make sense.
>
> There are two distinct problems.
>
> 1. The OF path, which uses of_find_net_device_by_node(), never releases
>    the elevated refcount on the conduit's kobject. Nominally, the OF and
>    non-OF paths should have identical code paths, and it is already
>    suspicious that dsa_dev_to_net_device() has a put_device() call which
>    is missing in dsa_port_parse_of(), but we can actually even verify
>    that an issue exists. With CONFIG_DEBUG_KOBJECT_RELEASE=y, if we run
>    this command "before" and "after" applying this patch:
>
> (unbind the conduit driver for net device eno2)
> echo 0000:00:00.2 > /sys/bus/pci/drivers/fsl_enetc/unbind
>
> we see these lines in the output diff which appear only with the patch
> applied:
>
> kobject: 'eno2' (ffff002009a3a6b8): kobject_release, parent 0000000000000000 (delayed 1000)
> kobject: '109' (ffff0020099d59a0): kobject_release, parent 0000000000000000 (delayed 1000)
>
> 2. After we find the conduit interface one way (OF) or another (non-OF),
>    it can get unregistered at any time, and DSA remains with a long-lived,
>    but in this case stale, cpu_dp->conduit pointer. Holding the net
>    device's underlying kobject isn't actually of much help, it just
>    prevents it from being freed (but we never need that kobject
>    directly). What helps us to prevent the net device from being
>    unregistered is the parallel netdev reference mechanism (dev_hold()
>    and dev_put()).
>
> Actually we actually use that netdev tracker mechanism implicitly on
> user ports since commit 2f1e8ea726e9 ("net: dsa: link interfaces with
> the DSA master to get rid of lockdep warnings"), via netdev_upper_dev_link().
> But time still passes at DSA switch probe time between the initial
> of_find_net_device_by_node() code and the user port creation time, time
> during which the conduit could unregister itself and DSA wouldn't know
> about it.
>
> So we have to run of_find_net_device_by_node() under rtnl_lock() to
> prevent that from happening, and release the lock only with the netdev
> tracker having acquired the reference.
>
> Do we need to keep the reference until dsa_unregister_switch() /
> dsa_switch_shutdown()?
> 1: Maybe yes. A switch device will still be registered even if all user
>    ports failed to probe, see commit 86f8b1c01a0a ("net: dsa: Do not
>    make user port errors fatal"), and the cpu_dp->conduit pointers
>    remain valid.  I haven't audited all call paths to see whether they
>    will actually use the conduit in lack of any user port, but if they
>    do, it seems safer to not rely on user ports for that reference.
> 2. Definitely yes. We support changing the conduit which a user port is
>    associated to, and we can get into a situation where we've moved all
>    user ports away from a conduit, thus no longer hold any reference to
>    it via the net device tracker. But we shouldn't let it go nonetheless
>    - see the next change in relation to dsa_tree_find_first_conduit()
>    and LAG conduits which disappear.
>    We have to be prepared to return to the physical conduit, so the CPU
>    port must explicitly keep another reference to it. This is also to
>    say: the user ports and their CPU ports may not always keep a
>    reference to the same conduit net device, and both are needed.
>
> As for the conduit's kobject for the /sys/class/net/ entry, we don't
> care about it, we can release it as soon as we hold the net device
> object itself.
>
> History and blame attribution
> -----------------------------
>
> The code has been refactored so many times, it is very difficult to
> follow and properly attribute a blame, but I'll try to make a short
> history which I hope to be correct.
>
> We have two distinct probing paths:
> - one for OF, introduced in 2016 in commit 83c0afaec7b7 ("net: dsa: Add
>   new binding implementation")
> - one for non-OF, introduced in 2017 in commit 71e0bbde0d88 ("net: dsa:
>   Add support for platform data")
>
> These are both complete rewrites of the original probing paths (which
> used struct dsa_switch_driver and other weird stuff, instead of regular
> devices on their respective buses for register access, like MDIO, SPI,
> I2C etc):
> - one for OF, introduced in 2013 in commit 5e95329b701c ("dsa: add
>   device tree bindings to register DSA switches")
> - one for non-OF, introduced in 2015 in commit 91da11f870f0 ("net:
>   Distributed Switch Architecture protocol support")
>
> except for tiny bits and pieces like dsa_dev_to_net_device() which were
> seemingly carried over since the original commit, and used to this day.
>
> The point is that the original probing paths received a fix in 2015 in
> the form of commit 679fb46c5785 ("net: dsa: Add missing master netdev
> dev_put() calls"), but the fix never made it into the "new" (dsa2)
> probing paths that can still be traced to today, and the fixed probing
> path was later deleted in 2019 in commit 93e86b3bc842 ("net: dsa: Remove
> legacy probing support").
>
> That is to say, the new probing paths were never quite correct in this
> area.
>
> The existence of the legacy probing support which was deleted in 2019
> explains why dsa_dev_to_net_device() returns a conduit with elevated
> refcount (because it was supposed to be released during
> dsa_remove_dst()). After the removal of the legacy code, the only user
> of dsa_dev_to_net_device() calls dev_put(conduit) immediately after this
> function returns. This pattern makes no sense today, and can only be
> interpreted historically to understand why dev_hold() was there in the
> first place.
>
> Change details
> --------------
>
> Today we have a better netdev tracking infrastructure which we should
> use. It belongs in common code (dsa_port_parse_cpu()), which shows that
> the OF and non-OF code paths aren't actually so different.
>
> When dsa_port_parse_cpu() or any subsequent function during setup fails,
> dsa_switch_release_ports() will be called. However, dsa_port_parse_cpu()
> may fail prior to us assigning cpu_dp->conduit and bumping the reference.
> So we have to test for the conduit being NULL prior to calling
> netdev_put().
>
> There have still been so many transformations to the code since the
> blamed commits (rename master -> conduit, commit 0650bf52b31f ("net:
> dsa: be compatible with masters which unregister on shutdown")), that it
> only makes sense to fix the code using the best methods available today
> and see how it can be backported to stable later. I suspect the fix
> cannot even be backported to kernels which lack dsa_switch_shutdown(),
> and I suspect this is also maybe why the long-lived conduit reference
> didn't make it into the new DSA probing paths at the time (problems
> during shutdown).
>
> Because dsa_dev_to_net_device() has a single call site and has to be
> changed anyway, the logic was just absorbed into the non-OF
> dsa_port_parse().

Largely matches my observations.

>
> Tested on the ocelot/felix switch and on dsa_loop, both on the NXP
> LS1028A with CONFIG_DEBUG_KOBJECT_RELEASE=y.

Also tested on b53, but since it neither has multiple conduits nor LAG
support (yet), the testing was very limited.

Two minor nits/comments though ...

>
> Reported-by: Ma Ke <make24@...as.ac.cn>
> Closes: https://lore.kernel.org/netdev/20251214131204.4684-1-make24@iscas.ac.cn/
> Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation")
> Fixes: 71e0bbde0d88 ("net: dsa: Add support for platform data")
> Signed-off-by: Vladimir Oltean <vladimir.oltean@....com>
> ---
>  include/net/dsa.h |  1 +
>  net/dsa/dsa.c     | 53 +++++++++++++++++++++++++----------------------
>  2 files changed, 29 insertions(+), 25 deletions(-)
>
> diff --git a/include/net/dsa.h b/include/net/dsa.h
> index cced1a866757..6b2b5ed64ea4 100644
> --- a/include/net/dsa.h
> +++ b/include/net/dsa.h
> @@ -302,6 +302,7 @@ struct dsa_port {
>         struct devlink_port     devlink_port;
>         struct phylink          *pl;
>         struct phylink_config   pl_config;
> +       netdevice_tracker       conduit_tracker;
>         struct dsa_lag          *lag;
>         struct net_device       *hsr_dev;
>
> diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
> index a20efabe778f..ac7900113d2b 100644
> --- a/net/dsa/dsa.c
> +++ b/net/dsa/dsa.c
> @@ -1221,6 +1221,7 @@ static int dsa_port_parse_cpu(struct dsa_port *dp, struct net_device *conduit,
>                 dst->tag_ops = tag_ops;
>         }
>
> +       netdev_hold(conduit, &dp->conduit_tracker, GFP_KERNEL);
>         dp->conduit = conduit;
>         dp->type = DSA_PORT_TYPE_CPU;
>         dsa_port_set_tag_protocol(dp, dst->tag_ops);
> @@ -1253,14 +1254,21 @@ static int dsa_port_parse_of(struct dsa_port *dp, struct device_node *dn)
>         if (ethernet) {
>                 struct net_device *conduit;
>                 const char *user_protocol;
> +               int err;
>
> +               rtnl_lock();
>                 conduit = of_find_net_device_by_node(ethernet);
>                 of_node_put(ethernet);
> -               if (!conduit)
> +               if (!conduit) {
> +                       rtnl_unlock();
>                         return -EPROBE_DEFER;
> +               }
>
>                 user_protocol = of_get_property(dn, "dsa-tag-protocol", NULL);

Maybe move this to before the rtnl_lock()? Not sure if this makes a
difference (avoiding a lookup for netdev not yet there vs avoiding a
lookup under rtnl lock).

> -               return dsa_port_parse_cpu(dp, conduit, user_protocol);
> +               err = dsa_port_parse_cpu(dp, conduit, user_protocol);
> +               put_device(&conduit->dev);
> +               rtnl_unlock();
> +               return err;
>         }
>
>         if (link)
> @@ -1393,37 +1401,27 @@ static struct device *dev_find_class(struct device *parent, char *class)
>         return device_find_child(parent, class, dev_is_class);
>  }
>
> -static struct net_device *dsa_dev_to_net_device(struct device *dev)
> -{
> -       struct device *d;
> -
> -       d = dev_find_class(dev, "net");
> -       if (d != NULL) {
> -               struct net_device *nd;
> -
> -               nd = to_net_dev(d);
> -               dev_hold(nd);
> -               put_device(d);
> -
> -               return nd;
> -       }
> -
> -       return NULL;
> -}
> -
>  static int dsa_port_parse(struct dsa_port *dp, const char *name,
>                           struct device *dev)
>  {
>         if (!strcmp(name, "cpu")) {
>                 struct net_device *conduit;
> +               struct device *d;
> +               int err;
>
> -               conduit = dsa_dev_to_net_device(dev);
> -               if (!conduit)
> +               rtnl_lock();
> +               d = dev_find_class(dev, "net");
> +               if (!d) {
> +                       rtnl_unlock();
>                         return -EPROBE_DEFER;
> +               }
>
> -               dev_put(conduit);
> +               conduit = to_net_dev(d);
>
> -               return dsa_port_parse_cpu(dp, conduit, NULL);
> +               err = dsa_port_parse_cpu(dp, conduit, NULL);
> +               put_device(d);
> +               rtnl_unlock();
> +               return err;
>         }
>
>         if (!strcmp(name, "dsa"))
> @@ -1491,6 +1489,9 @@ static void dsa_switch_release_ports(struct dsa_switch *ds)
>         struct dsa_vlan *v, *n;
>
>         dsa_switch_for_each_port_safe(dp, next, ds) {
> +               if (dsa_port_is_cpu(dp) && dp->conduit)
> +                       netdev_put(dp->conduit, &dp->conduit_tracker);
> +
>                 /* These are either entries that upper layers lost track of
>                  * (probably due to bugs), or installed through interfaces
>                  * where one does not necessarily have to remove them, like
> @@ -1635,8 +1636,10 @@ void dsa_switch_shutdown(struct dsa_switch *ds)
>         /* Disconnect from further netdevice notifiers on the conduit,
>          * since netdev_uses_dsa() will now return false.
>          */
> -       dsa_switch_for_each_cpu_port(dp, ds)
> +       dsa_switch_for_each_cpu_port(dp, ds) {
> +               netdev_put(dp->conduit, &dp->conduit_tracker);
>                 dp->conduit->dsa_ptr = NULL;

We should probably call netdev_put() only after clearing
dp->conduit->ds_ptr, not before.

With that addressed,

Reviewed-by: Jonas Gorski <jonas.gorski@...il.com>

> +       }
>
>         rtnl_unlock();
>  out:
> --
> 2.43.0
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ