lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <86f2262d059db84070745e299d96dde3e6078220.camel@gmail.com>
Date: Tue, 27 Feb 2024 17:55:07 +0100
From: Nuno Sá <noname.nuno@...il.com>
To: Herve Codina <herve.codina@...tlin.com>, Saravana Kannan
	 <saravanak@...gle.com>, Luca Ceresoli <luca.ceresoli@...tlin.com>, Nuno Sa
	 <nuno.sa@...log.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>, "Rafael J. Wysocki"
 <rafael@...nel.org>, Rob Herring <robh+dt@...nel.org>, Frank Rowand
 <frowand.list@...il.com>, Lizhi Hou <lizhi.hou@....com>, Max Zhen
 <max.zhen@....com>, Sonal Santan <sonal.santan@....com>, Stefano Stabellini
 <stefano.stabellini@...inx.com>, Jonathan Cameron
 <Jonathan.Cameron@...wei.com>,  linux-kernel@...r.kernel.org,
 devicetree@...r.kernel.org, Allan Nielsen <allan.nielsen@...rochip.com>,
 Horatiu Vultur <horatiu.vultur@...rochip.com>,  Steen Hegelund
 <steen.hegelund@...rochip.com>, Thomas Petazzoni
 <thomas.petazzoni@...tlin.com>, Android Kernel Team
 <kernel-team@...roid.com>
Subject: Re: [PATCH 2/2] of: overlay: Synchronize of_overlay_remove() with
 the devlink removals

On Tue, 2024-02-27 at 16:24 +0100, Herve Codina wrote:
> Hi Saravana, Luca, Nuno,
> 
> On Tue, 20 Feb 2024 16:37:05 -0800
> Saravana Kannan <saravanak@...gle.com> wrote:
> 
> ...
> 
> > > 
> > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
> > > index a9a292d6d59b..5c5f808b163e 100644
> > > --- a/drivers/of/overlay.c
> > > +++ b/drivers/of/overlay.c
> > > @@ -1202,6 +1202,12 @@ int of_overlay_remove(int *ovcs_id)
> > >                 goto out;
> > >         }
> > > 
> > > +       /*
> > > +        * Wait for any ongoing device link removals before removing some of
> > > +        * nodes
> > > +        */
> > > +       device_link_wait_removal();
> > > +  
> > 
> > Nuno in his patch[1] had this "wait" happen inside
> > __of_changeset_entry_destroy(). Which seems to be necessary to not hit
> > the issue that Luca reported[2] in this patch series. Is there any
> > problem with doing that?
> > 
> > Luca for some reason did a unlock/lock(of_mutex) in his test patch and
> > I don't think that's necessary.
> 
> I think the unlock/lock in Luca's case and so in Nuno's case is needed.
> 
> I do the device_link_wait_removal() wihout having the of_mutex locked.
> 
> Now, suppose I do the device_link_wait_removal() call with the of_mutex locked.
> The following flow is allowed and a deadlock is present.
> 
> of_overlay_remove()
>   lock(of_mutex)
>      device_link_wait_removal()
> 
> And, from the workqueue jobs execution:
>   ...
>     device_put()
>       some_driver->remove()
>         of_overlay_remove() <--- The job will never end.
>                                  It is waiting for of_mutex.
>                                  Deadlock
> 

We may need some input from Saravana (and others) on this. I might be missing
something but can a put_device() lead into a driver remove callback? Driver code is
not device code and put_device() leads to device_release() which will either call the
device ->release(), ->type->release() or the class ->dev_release(). And, IMO, calling
of_overlay_remove() or something like that (like something that would lead to
unbinding a device from it's driver) in a device release callback would be at the
very least very questionable. Typically, what you see in there is of_node_put() and
things like kfree() of the device itself or any other data.

The driver remove callback should be called when unbinding the device from it's
drivers and devlinks should also be removed after device_unbind_cleanup() (i.e, after
the driver remove callback).

Having said the above, the driver core has lots of subtleties so, again, I can be
missing something. But at this point I'm still not seeing any deadlock...

- Nuno Sá


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ