lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <93d0adf0ee6f3737af4d482dc206fe152f762482.camel@gmail.com>
Date: Mon, 04 Mar 2024 16:36:15 +0100
From: Nuno Sá <noname.nuno@...il.com>
To: Rob Herring <robh@...nel.org>
Cc: Herve Codina <herve.codina@...tlin.com>, Greg Kroah-Hartman
 <gregkh@...uxfoundation.org>, "Rafael J. Wysocki" <rafael@...nel.org>,
 Frank Rowand <frowand.list@...il.com>, Lizhi Hou <lizhi.hou@....com>, Max
 Zhen <max.zhen@....com>,  Sonal Santan <sonal.santan@....com>, Stefano
 Stabellini <stefano.stabellini@...inx.com>, Jonathan Cameron
 <Jonathan.Cameron@...wei.com>, linux-kernel@...r.kernel.org, 
 devicetree@...r.kernel.org, Allan Nielsen <allan.nielsen@...rochip.com>, 
 Horatiu Vultur <horatiu.vultur@...rochip.com>, Steen Hegelund
 <steen.hegelund@...rochip.com>, Luca Ceresoli <luca.ceresoli@...tlin.com>,
 Nuno Sa <nuno.sa@...log.com>, Thomas Petazzoni
 <thomas.petazzoni@...tlin.com>, stable@...r.kernel.org
Subject: Re: [PATCH v3 2/2] of: overlay: Synchronize of_overlay_remove()
 with the devlink removals

On Mon, 2024-03-04 at 09:22 -0600, Rob Herring wrote:
> On Thu, Feb 29, 2024 at 12:18:49PM +0100, Nuno Sá wrote:
> > On Thu, 2024-02-29 at 11:52 +0100, Herve Codina wrote:
> > > In the following sequence:
> > >   1) of_platform_depopulate()
> > >   2) of_overlay_remove()
> > > 
> > > During the step 1, devices are destroyed and devlinks are removed.
> > > During the step 2, OF nodes are destroyed but
> > > __of_changeset_entry_destroy() can raise warnings related to missing
> > > of_node_put():
> > >   ERROR: memory leak, expected refcount 1 instead of 2 ...
> > > 
> > > Indeed, during the devlink removals performed at step 1, the removal
> > > itself releasing the device (and the attached of_node) is done by a job
> > > queued in a workqueue and so, it is done asynchronously with respect to
> > > function calls.
> > > When the warning is present, of_node_put() will be called but wrongly
> > > too late from the workqueue job.
> > > 
> > > In order to be sure that any ongoing devlink removals are done before
> > > the of_node destruction, synchronize the of_overlay_remove() with the
> > > devlink removals.
> > > 
> > > Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal")
> > > Cc: stable@...r.kernel.org
> > > Signed-off-by: Herve Codina <herve.codina@...tlin.com>
> > > ---
> > >  drivers/of/overlay.c | 10 +++++++++-
> > >  1 file changed, 9 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
> > > index 2ae7e9d24a64..7a010a62b9d8 100644
> > > --- a/drivers/of/overlay.c
> > > +++ b/drivers/of/overlay.c
> > > @@ -8,6 +8,7 @@
> > >  
> > >  #define pr_fmt(fmt)	"OF: overlay: " fmt
> > >  
> > > +#include <linux/device.h>
> > 
> > This is clearly up to the DT maintainers to decide but, IMHO, I would very
> > much
> > prefer to see fwnode.h included in here rather than directly device.h (so
> > yeah,
> > renaming the function to fwnode_*).
> 
> IMO, the DT code should know almost nothing about fwnode because that's 
> the layer above it. But then overlay stuff is kind of a layer above the 
> core DT code too.

Yeah, my reasoning is just that it may be better than knowing about device.h
code... But maybe I'm wrong :)

> 
> > But yeah, I might be biased by own series :)
> > 
> > >  #include <linux/kernel.h>
> > >  #include <linux/module.h>
> > >  #include <linux/of.h>
> > > @@ -853,6 +854,14 @@ static void free_overlay_changeset(struct
> > > overlay_changeset *ovcs)
> > >  {
> > >  	int i;
> > >  
> > > +	/*
> > > +	 * Wait for any ongoing device link removals before removing some
> > > of
> > > +	 * nodes. Drop the global lock while waiting
> > > +	 */
> > > +	mutex_unlock(&of_mutex);
> > > +	device_link_wait_removal();
> > > +	mutex_lock(&of_mutex);
> > 
> > I'm still not convinced we need to drop the lock. What happens if someone
> > else
> > grabs the lock while we are in device_link_wait_removal()? Can we guarantee
> > that
> > we can't screw things badly?
> 
> It is also just ugly because it's the callers of 
> free_overlay_changeset() that hold the lock and now we're releasing it 
> behind their back.
> 
> As device_link_wait_removal() is called before we touch anything, can't 
> it be called before we take the lock? And do we need to call it if 
> applying the overlay fails?
> 

My natural feeling was to put it right before checking the node refcount... and
I would like to still see proof that there's any potential deadlock. I did not
checked the code but the issue with calling it before we take the lock is that
likely the device links wont be removed because the overlay removal path (which
unbinds devices from drivers) needs to run under the lock?

- Nuno Sá

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ