[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAGETcx-iZ67NMZtAQLDj5CnftsYoEetMvu1fpsgJjb6ar7bCeQ@mail.gmail.com>
Date: Thu, 13 Feb 2025 00:08:51 -0800
From: Saravana Kannan <saravanak@...gle.com>
To: Luca Ceresoli <luca.ceresoli@...tlin.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>, "Rafael J. Wysocki" <rafael@...nel.org>,
Francesco <francesco.dolcini@...adex.com>, Geert Uytterhoeven <geert@...ux-m68k.org>,
Tomi Valkeinen <tomi.valkeinen@...asonboard.com>, kernel-team@...roid.com,
linux-kernel@...r.kernel.org, Rob Herring <robh@...nel.org>,
Krzysztof Kozlowski <krzysztof.kozlowski@...aro.org>, Conor Dooley <conor@...nel.org>,
Hervé Codina <herve.codina@...tlin.com>
Subject: Re: [PATCH v3] driver core: fw_devlink: Stop trying to optimize cycle
detection logic
On Wed, Feb 12, 2025 at 7:33 AM Luca Ceresoli <luca.ceresoli@...tlin.com> wrote:
>
> Hello,
>
> On Fri, 6 Dec 2024 10:31:43 +0100
> Luca Ceresoli <luca.ceresoli@...tlin.com> wrote:
>
> > > After rebasing my work for the hotplug connector driver using device
> > > tree overlays [0] on v6.13-rc1 I started getting these OF errors on
> > > overlay removal:
> > >
> > > OF: ERROR: memory leak, expected refcount 1 instead of 2, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /addon-connector/devices/panel-dsi-lvds
> > > OF: ERROR: memory leak, expected refcount 1 instead of 2, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /addon-connector/devices/backlight-addon
> > > OF: ERROR: memory leak, expected refcount 1 instead of 2, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /addon-connector/devices/battery-charger
> > > OF: ERROR: memory leak, expected refcount 1 instead of 2, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /addon-connector/devices/regulator-addon-5v0-sys
> > > OF: ERROR: memory leak, expected refcount 1 instead of 2, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /addon-connector/devices/regulator-addon-3v3-sys
> > >
> > > ...and many more. Exactly one per each device in the overlay 'devices'
> > > node, each implemented by a platform driver.
> > >
> > > Bisecting found this patch is triggering these error messages, which
> > > in fact disappear by reverting it.
> > >
> > > I looked at the differences in dmesg and /sys/class/devlink/ in the
> > > "good" and "bad" cases, and found almost no differences. The only
> > > relevant difference is in cycle detection for the panel node, which was
> > > expected, but nothing about all the other nodes like regulators.
> > >
> > > Enabling debug messages in core.c also does not show significant
> > > changes between the two cases, even though it's hard to be sure given
> > > the verbosity of the log and the reordering of messages.
> > >
> > > I suspect the new version of the cycle removal code is missing an
> > > of_node_get() somewhere, but that is not directly visible in the patch
> > > diff itself.
> >
> > I collected some more info by adding a bit of logging for one of the
> > affected devices.
> >
> > It looks like the of_node_get() and of_node_put() in the overlay
> > loading phase are the same, even though not completely in the same
> > order. So after overlay insertion we should have the same refcount with
> > and without your patch.
> >
> > There is a difference on overlay removal however: an of_node_put() call
> > is absent with 6.13-rc1 code (errors emitted), and becomes present by
> > just reverting your patch (the "good" case). Here's the stack trace of
> > this call:
> >
> > Call trace:
> > show_stack+0x20/0x38 (C)
> > dump_stack_lvl+0x74/0x90
> > dump_stack+0x18/0x28
> > of_node_put+0x50/0x70
> > platform_device_release+0x24/0x68
> > device_release+0x3c/0xa0
> > kobject_put+0xa4/0x118
> > device_link_release_fn+0x60/0xd8
> > process_one_work+0x158/0x3c0
> > worker_thread+0x2d8/0x3e8
> > kthread+0x118/0x128
> > ret_from_fork+0x10/0x20
> >
> > So for some reason device_link_release_fn() is not leading to a
> > of_node_put() call after adding your patch.
> >
> > Quick code inspection did not show any useful info for me to understand
> > more.
>
> I just sent a patch fixing
> this: https://lore.kernel.org/20250212-fix__fw_devlink_relax_cycles_missing_device_put-v1-1-41818c7d7722@bootlin.com
Thanks a lot for debugging and fixing this! I'll review that patch.
-Saravana
Powered by blists - more mailing lists