lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAL_Jsq+oqmci-11TB83Hx3Ry44+OKNjQyu14i-zKdNNypE30nw@mail.gmail.com>
Date: Tue, 11 Feb 2025 08:15:39 -0600
From: Rob Herring <robh@...nel.org>
To: Laurent Pinchart <laurent.pinchart@...asonboard.com>
Cc: "zhangzekun (A)" <zhangzekun11@...wei.com>, Oleksij Rempel <o.rempel@...gutronix.de>, 
	saravanak@...gle.com, justin.chen@...adcom.com, florian.fainelli@...adcom.com, 
	andrew+netdev@...n.ch, kuba@...nel.org, kory.maincent@...tlin.com, 
	jacopo+renesas@...ndi.org, kieran.bingham+renesas@...asonboard.com, 
	maddy@...ux.ibm.com, mpe@...erman.id.au, npiggin@...il.com, olteanv@...il.com, 
	davem@...emloft.net, taras.chornyi@...ision.eu, edumazet@...gle.com, 
	pabeni@...hat.com, sudeep.holla@....com, cristian.marussi@....com, 
	arm-scmi@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org, 
	linux-media@...r.kernel.org, netdev@...r.kernel.org, 
	devicetree@...r.kernel.org, chenjun102@...wei.com
Subject: Re: [PATCH 1/9] of: Add warpper function of_find_node_by_name_balanced()

On Fri, Feb 7, 2025 at 9:37 AM Laurent Pinchart
<laurent.pinchart@...asonboard.com> wrote:
>
> Hi Zekun,
>
> On Fri, Feb 07, 2025 at 07:28:23PM +0800, zhangzekun (A) wrote:
> > 在 2025/2/7 16:24, Oleksij Rempel 写道:
> > > On Fri, Feb 07, 2025 at 09:31:09AM +0800, Zhang Zekun wrote:
> > >> There are many drivers use of_find_node_by_name() with a not-NULL
> > >> device_node pointer, and a number of callers would require a call to
> > >> of_node_get() before using it. There are also some drivers who forget
> > >> to call of_node_get() which would cause a ref count leak[1]. So, Add a
> > >> wraper function for of_find_node_by_name(), drivers may use this function
> > >> to call of_find_node_by_name() with the refcount already balanced.
> > >>
> > >> [1] https://lore.kernel.org/all/20241024015909.58654-1-zhangzekun11@huawei.com/
> > >
> > > Hi Zhang Zekun,
> > >
> > > thank you for working on this issue!
> > >
> > > First of all, let's take a step back and analyze the initial problem.
> > > Everything following is only my opinion...
> > >
> > > The main issue I see is that the current API - of_find_node_by_name -
> > > modifies the refcount of its input by calling of_node_put(from) as part
> > > of its search. Typically, a "find" function is expected to treat its
> > > input as read-only. That is, when you pass an object into such a
> > > function, you expect its reference count to remain unchanged unless
> > > ownership is explicitly transferred. In this case, lowering the refcount
> > > on the input node is counterintuitive and already lead to unexpected
> > > behavior and subtle bugs.
> > >
> > > To address this, the workaround introduces a wrapper function,
> > > of_find_node_by_name_balanced, which first increments the input’s
> > > refcount (via of_node_get()) before calling the original function. While
> > > this "balances" the refcount change, the naming remains problematic from
> > > my perspective. The "_balanced" suffix isn’t part of our common naming
> > > conventions (traditions? :)). Most drivers expect that a function
> > > starting with "find" will not alter the reference count of its input.
> > > The term "balanced" doesn’t clearly convey that the input's refcount is
> > > being explicitly managed - it instead obscures the underlying behavior,
> > > leaving many developers confused about what guarantees the API provides.
> > >
> > > In my view, a more natural solution would be to redesign the API so that
> > > it doesn’t modify the input object’s refcount at all. Instead, it should
> > > solely increase the refcount of the returned node (if found) for safe
> > > asynchronous usage. This approach would align with established
> > > conventions where "find" implies no side effects on inputs or output,
> > > and a "get" indicates that the output comes with an extra reference. For
> > > example, a function named of_get_node_by_name would clearly signal that
> > > only the returned node is subject to a refcount increase while leaving
> > > the input intact.
> > >
> > > Thus, while the current workaround "balances" the reference count, it
> > > doesn't address the underlying design flaw. The naming still suggests a
> > > "find" function that should leave the input untouched, which isn’t the
> > > case here. A redesign of the API - with both the behavior and naming
> > > aligned to common expectations - would be a clearer and more robust
> > > solution.
> > >
> > > Nevertheless, it is only my POV, and the final decision rests with the
> > > OpenFirmware framework maintainers.
> > >
> > > Best Regards,
> > > Oleksij
> >
> > Hi, Oleksij,
> >
> > Thanks for your review. I think redesign the API would be a fundamental
> > way to fix this issue, but it would cause impact for current users who
> > knows the exact functionality of of_find_node_by_name(), which need to
> > put the "from" before calling it (I can't make the assumption that all
> > of drivers calling of_find_node_by_name() with a not-NULL "from"
> > pointer, but not calling of_node_get() before is misuse). The basic idea
> > for adding a new function is try not to impact current users. For users
> > who are confused about the "_balanced" suffix,the comments of
> > of_find_node_by_name() explains why this interface exists.
>
> I think we all agree that of_find_node_by_name() is miused, and that it
> shows the API isn't optimal. What we have different opinions on is how
> to make the API less error-prone. I think adding a new
> of_find_node_by_name_balanced() function works around the issue and
> doesn't improve the situation much, I would argue it makes things even
> more confusing.
>
> We have only 20 calls to of_find_node_by_name() with a non-NULL first
> argument in v6.14-rc1:
>
> arch/powerpc/platforms/chrp/pci.c:      rtas = of_find_node_by_name (root, "rtas");
>
> The 'root' variable here is the result of a call to
> 'of_find_node_by_path("/")', so I think we could pass a null pointer
> instead to simplify things.

I think this could just be 'of_find_node_by_path("/rtas")' which does
occur elsewhere.

> arch/powerpc/platforms/powermac/pic.c:          slave = of_find_node_by_name(master, "mac-io");
>
> Here I believe of_find_node_by_name() is called to find a *child* node
> of 'master'. of_find_node_by_name() is the wrong function for that.

Yes, I think that's always a child of the PCI bus.

>
> arch/sparc/kernel/leon_kernel.c:        np = of_find_node_by_name(rootnp, "GAISLER_IRQMP");
> arch/sparc/kernel/leon_kernel.c:                np = of_find_node_by_name(rootnp, "01_00d");
> arch/sparc/kernel/leon_kernel.c:        np = of_find_node_by_name(nnp, "GAISLER_GPTIMER");
> arch/sparc/kernel/leon_kernel.c:                np = of_find_node_by_name(nnp, "01_011");
>
> Here too the code seems to be looking for child nodes only (but I
> couldn't find a DT example or binding in-tree, so I'm not entirely
> sure).

Sparc doesn't use OF_DYNAMIC, so who cares... But there are some Sparc
DT dumps here:

https://git.kernel.org/pub/scm/linux/kernel/git/davem/prtconfs.git/

>
> drivers/clk/ti/clk.c:   return of_find_node_by_name(from, tmp);
>
> Usage here seems correct, the reference-count decrement is intended.
>
> drivers/media/i2c/max9286.c:    i2c_mux = of_find_node_by_name(dev->of_node, "i2c-mux");
> drivers/media/platform/qcom/venus/core.c:       enp = of_find_node_by_name(dev->of_node, node_name);
> drivers/net/dsa/bcm_sf2.c:      ports = of_find_node_by_name(dn, "ports");
> drivers/net/dsa/hirschmann/hellcreek_ptp.c:     leds = of_find_node_by_name(hellcreek->dev->of_node, "leds");
> drivers/net/ethernet/broadcom/asp2/bcmasp.c:    ports_node = of_find_node_by_name(dev->of_node, "ethernet-ports");
> drivers/net/ethernet/marvell/prestera/prestera_main.c:  ports = of_find_node_by_name(sw->np, "ports");
> drivers/net/pse-pd/tps23881.c:  channels_node = of_find_node_by_name(priv->np, "channels");
> drivers/regulator/scmi-regulator.c:     np = of_find_node_by_name(handle->dev->of_node, "regulators");
> drivers/regulator/tps6594-regulator.c:          np = of_find_node_by_name(tps->dev->of_node, multi_regs[multi].supply_name);
>
> Incorrect usage, as far as I understand all those drivers are looking
> for child nodes only.
>
> drivers/of/unittest.c:          found = of_find_node_by_name(nd->overlay, "test-unittest16");
> drivers/of/unittest.c:          found = of_find_node_by_name(nd->overlay, "test-unittest17");
> drivers/of/unittest.c:          found = of_find_node_by_name(nd->overlay, "test-unittest18");
> drivers/of/unittest.c:          found = of_find_node_by_name(nd->overlay, "test-unittest19");
>
> Here too I think only child nodes are meant to be considered.
>
> of_find_node_by_name() is very much misused as most callers want to find
> child nodes, while of_find_node_by_name() will walk the whole DT from a
> given starting point.

Agreed. In general, it's preferred to look for nodes by compatible
rather than node name with child nodes being an exception.

> I think the right fix here is to
>
> - Replace of_find_node_by_name(root, ...) with
>   of_find_node_by_name(NULL, ...) in arch/powerpc/platforms/chrp/pci.c
>   (if my understanding of the code is correct).
>
> - Replace of_find_node_by_name() with of_get_child_by_name() in callers
>   that need to search immediate children only (I expected that to be the
>   majority of the above call sites).

+1

> - If there are other callers that need to find indirect children,
>   introduce a new of_get_child_by_name_recursive() function.
>
> At that point, the only remaining caller of of_find_node_by_name()
> (beside its usage in the for_each_node_by_name() macro) will be
> drivers/clk/ti/clk.c, which uses the function correctly.
>
> I'm tempted to then rename of_find_node_by_name() to
> __of_find_node_by_name() to indicate it's an internal function not meant
> to be called except in special cases. It could all be renamed to
> __of_find_next_node_by_name() to make its behaviour clearer.

+1

There's a number of functions which I classify as "don't add new
users" which I keep meaning to document. It's generally older stuff
still using them. Anything returning pointers to raw DT data are
problematic for dynamic DT. Hence all the patches removing
of_find_property/of_get_property calls.

Rob

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ