[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGETcx_zB95nyTpi-_kYW_VqnPqMEc8mS9sewSwRNVr0x=7+kA@mail.gmail.com>
Date: Tue, 20 Feb 2024 16:37:05 -0800
From: Saravana Kannan <saravanak@...gle.com>
To: Herve Codina <herve.codina@...tlin.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>, "Rafael J. Wysocki" <rafael@...nel.org>,
Rob Herring <robh+dt@...nel.org>, Frank Rowand <frowand.list@...il.com>,
Lizhi Hou <lizhi.hou@....com>, Max Zhen <max.zhen@....com>,
Sonal Santan <sonal.santan@....com>, Stefano Stabellini <stefano.stabellini@...inx.com>,
Jonathan Cameron <Jonathan.Cameron@...wei.com>, linux-kernel@...r.kernel.org,
devicetree@...r.kernel.org, Allan Nielsen <allan.nielsen@...rochip.com>,
Horatiu Vultur <horatiu.vultur@...rochip.com>,
Steen Hegelund <steen.hegelund@...rochip.com>,
Thomas Petazzoni <thomas.petazzoni@...tlin.com>, Luca Ceresoli <luca.ceresoli@...tlin.com>,
Nuno Sa <nuno.sa@...log.com>, Android Kernel Team <kernel-team@...roid.com>
Subject: Re: [PATCH 2/2] of: overlay: Synchronize of_overlay_remove() with the
devlink removals
On Thu, Nov 30, 2023 at 9:41 AM Herve Codina <herve.codina@...tlin.com> wrote:
>
> In the following sequence:
> 1) of_platform_depopulate()
> 2) of_overlay_remove()
>
> During the step 1, devices are destroyed and devlinks are removed.
> During the step 2, OF nodes are destroyed but
> __of_changeset_entry_destroy() can raise warnings related to missing
> of_node_put():
> ERROR: memory leak, expected refcount 1 instead of 2 ...
>
> Indeed, during the devlink removals performed at step 1, the removal
> itself releasing the device (and the attached of_node) is done by a job
> queued in a workqueue and so, it is done asynchronously with respect to
> function calls.
> When the warning is present, of_node_put() will be called but wrongly
> too late from the workqueue job.
>
> In order to be sure that any ongoing devlink removals are done before
> the of_node destruction, synchronize the of_overlay_remove() with the
> devlink removals.
>
Add Fixes tag for this one too to point to the change that added the workqueue.
Please CC Nuno and Luca in your v2 series.
> Signed-off-by: Herve Codina <herve.codina@...tlin.com>
> ---
> drivers/of/overlay.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
> index a9a292d6d59b..5c5f808b163e 100644
> --- a/drivers/of/overlay.c
> +++ b/drivers/of/overlay.c
> @@ -1202,6 +1202,12 @@ int of_overlay_remove(int *ovcs_id)
> goto out;
> }
>
> + /*
> + * Wait for any ongoing device link removals before removing some of
> + * nodes
> + */
> + device_link_wait_removal();
> +
Nuno in his patch[1] had this "wait" happen inside
__of_changeset_entry_destroy(). Which seems to be necessary to not hit
the issue that Luca reported[2] in this patch series. Is there any
problem with doing that?
Luca for some reason did a unlock/lock(of_mutex) in his test patch and
I don't think that's necessary.
Can you move this call to where Nuno did it and see if that works for
all of you?
[1] - https://lore.kernel.org/all/20240205-fix-device-links-overlays-v2-2-5344f8c79d57@analog.com/
[2] - https://lore.kernel.org/all/20231220181627.341e8789@booty/
Thank,
Saravana
> mutex_lock(&of_mutex);
>
> ovcs = idr_find(&ovcs_idr, *ovcs_id);
> --
> 2.42.0
>
>
Powered by blists - more mailing lists