lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250212163320.24d30adb@booty>
Date: Wed, 12 Feb 2025 16:33:20 +0100
From: Luca Ceresoli <luca.ceresoli@...tlin.com>
To: Saravana Kannan <saravanak@...gle.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>, "Rafael J. Wysocki"
 <rafael@...nel.org>, Francesco <francesco.dolcini@...adex.com>, Geert
 Uytterhoeven <geert@...ux-m68k.org>, Tomi Valkeinen
 <tomi.valkeinen@...asonboard.com>, kernel-team@...roid.com,
 linux-kernel@...r.kernel.org, Rob Herring <robh@...nel.org>, Krzysztof
 Kozlowski <krzysztof.kozlowski@...aro.org>, Conor Dooley
 <conor@...nel.org>, Hervé Codina <herve.codina@...tlin.com>
Subject: Re: [PATCH v3] driver core: fw_devlink: Stop trying to optimize
 cycle detection logic

Hello,

On Fri, 6 Dec 2024 10:31:43 +0100
Luca Ceresoli <luca.ceresoli@...tlin.com> wrote:

> > After rebasing my work for the hotplug connector driver using device
> > tree overlays [0] on v6.13-rc1 I started getting these OF errors on
> > overlay removal:
> > 
> > OF: ERROR: memory leak, expected refcount 1 instead of 2, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /addon-connector/devices/panel-dsi-lvds
> > OF: ERROR: memory leak, expected refcount 1 instead of 2, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /addon-connector/devices/backlight-addon
> > OF: ERROR: memory leak, expected refcount 1 instead of 2, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /addon-connector/devices/battery-charger
> > OF: ERROR: memory leak, expected refcount 1 instead of 2, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /addon-connector/devices/regulator-addon-5v0-sys
> > OF: ERROR: memory leak, expected refcount 1 instead of 2, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /addon-connector/devices/regulator-addon-3v3-sys
> > 
> > ...and many more. Exactly one per each device in the overlay 'devices'
> > node, each implemented by a platform driver.
> > 
> > Bisecting found this patch is triggering these error messages, which
> > in fact disappear by reverting it.
> > 
> > I looked at the differences in dmesg and /sys/class/devlink/ in the
> > "good" and "bad" cases, and found almost no differences. The only
> > relevant difference is in cycle detection for the panel node, which was
> > expected, but nothing about all the other nodes like regulators.
> > 
> > Enabling debug messages in core.c also does not show significant
> > changes between the two cases, even though it's hard to be sure given
> > the verbosity of the log and the reordering of messages.
> > 
> > I suspect the new version of the cycle removal code is missing an
> > of_node_get() somewhere, but that is not directly visible in the patch
> > diff itself.  
> 
> I collected some more info by adding a bit of logging for one of the
> affected devices.
> 
> It looks like the of_node_get() and of_node_put() in the overlay
> loading phase are the same, even though not completely in the same
> order. So after overlay insertion we should have the same refcount with
> and without your patch.
> 
> There is a difference on overlay removal however: an of_node_put() call
> is absent with 6.13-rc1 code (errors emitted), and becomes present by
> just reverting your patch (the "good" case). Here's the stack trace of
> this call:
> 
>  Call trace:
>   show_stack+0x20/0x38 (C)
>   dump_stack_lvl+0x74/0x90
>   dump_stack+0x18/0x28
>   of_node_put+0x50/0x70
>   platform_device_release+0x24/0x68
>   device_release+0x3c/0xa0
>   kobject_put+0xa4/0x118
>   device_link_release_fn+0x60/0xd8
>   process_one_work+0x158/0x3c0
>   worker_thread+0x2d8/0x3e8
>   kthread+0x118/0x128
>   ret_from_fork+0x10/0x20
> 
> So for some reason device_link_release_fn() is not leading to a
> of_node_put() call after adding your patch.
> 
> Quick code inspection did not show any useful info for me to understand
> more.

I just sent a patch fixing
this: https://lore.kernel.org/20250212-fix__fw_devlink_relax_cycles_missing_device_put-v1-1-41818c7d7722@bootlin.com

Luca

-- 
Luca Ceresoli, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ