lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 29 Feb 2024 12:18:49 +0100
From: Nuno Sá <noname.nuno@...il.com>
To: Herve Codina <herve.codina@...tlin.com>, Greg Kroah-Hartman
	 <gregkh@...uxfoundation.org>, "Rafael J. Wysocki" <rafael@...nel.org>, Rob
	Herring <robh+dt@...nel.org>, Frank Rowand <frowand.list@...il.com>
Cc: Lizhi Hou <lizhi.hou@....com>, Max Zhen <max.zhen@....com>, Sonal Santan
 <sonal.santan@....com>, Stefano Stabellini <stefano.stabellini@...inx.com>,
  Jonathan Cameron <Jonathan.Cameron@...wei.com>,
 linux-kernel@...r.kernel.org, devicetree@...r.kernel.org, Allan Nielsen
 <allan.nielsen@...rochip.com>, Horatiu Vultur
 <horatiu.vultur@...rochip.com>, Steen Hegelund
 <steen.hegelund@...rochip.com>, Luca Ceresoli <luca.ceresoli@...tlin.com>,
 Nuno Sa <nuno.sa@...log.com>, Thomas Petazzoni
 <thomas.petazzoni@...tlin.com>,  stable@...r.kernel.org
Subject: Re: [PATCH v3 2/2] of: overlay: Synchronize of_overlay_remove()
 with the devlink removals

On Thu, 2024-02-29 at 11:52 +0100, Herve Codina wrote:
> In the following sequence:
>   1) of_platform_depopulate()
>   2) of_overlay_remove()
> 
> During the step 1, devices are destroyed and devlinks are removed.
> During the step 2, OF nodes are destroyed but
> __of_changeset_entry_destroy() can raise warnings related to missing
> of_node_put():
>   ERROR: memory leak, expected refcount 1 instead of 2 ...
> 
> Indeed, during the devlink removals performed at step 1, the removal
> itself releasing the device (and the attached of_node) is done by a job
> queued in a workqueue and so, it is done asynchronously with respect to
> function calls.
> When the warning is present, of_node_put() will be called but wrongly
> too late from the workqueue job.
> 
> In order to be sure that any ongoing devlink removals are done before
> the of_node destruction, synchronize the of_overlay_remove() with the
> devlink removals.
> 
> Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal")
> Cc: stable@...r.kernel.org
> Signed-off-by: Herve Codina <herve.codina@...tlin.com>
> ---
>  drivers/of/overlay.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
> index 2ae7e9d24a64..7a010a62b9d8 100644
> --- a/drivers/of/overlay.c
> +++ b/drivers/of/overlay.c
> @@ -8,6 +8,7 @@
>  
>  #define pr_fmt(fmt)	"OF: overlay: " fmt
>  
> +#include <linux/device.h>

This is clearly up to the DT maintainers to decide but, IMHO, I would very much
prefer to see fwnode.h included in here rather than directly device.h (so yeah,
renaming the function to fwnode_*).

But yeah, I might be biased by own series :)

>  #include <linux/kernel.h>
>  #include <linux/module.h>
>  #include <linux/of.h>
> @@ -853,6 +854,14 @@ static void free_overlay_changeset(struct
> overlay_changeset *ovcs)
>  {
>  	int i;
>  
> +	/*
> +	 * Wait for any ongoing device link removals before removing some of
> +	 * nodes. Drop the global lock while waiting
> +	 */
> +	mutex_unlock(&of_mutex);
> +	device_link_wait_removal();
> +	mutex_lock(&of_mutex);

I'm still not convinced we need to drop the lock. What happens if someone else
grabs the lock while we are in device_link_wait_removal()? Can we guarantee that
we can't screw things badly?

The question is, do you have a system/use case where you can really see the
deadlock happening? Until I see one, I'm very skeptical about this. And if we
have one, I'm not really sure this is also the right solution for it.

- Nuno Sá



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ