linux-kernel - Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <201106220149.17380.rjw@sisk.pl>
Date:	Wed, 22 Jun 2011 01:49:17 +0200
From:	"Rafael J. Wysocki" <rjw@...k.pl>
To:	Alan Stern <stern@...land.harvard.edu>
Cc:	Linux PM mailing list <linux-pm@...ts.linux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Jesse Barnes <jbarnes@...tuousgeek.org>,
	"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>
Subject: Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep

On Tuesday, June 21, 2011, Alan Stern wrote:
> On Mon, 20 Jun 2011, Rafael J. Wysocki wrote:
> 
> > > Ah, okay.  The PCI part makes sense then.
> > 
> > OK, so the appended patch is a modification of the $subject one using
> > pm_runtime_put_sync() instead of pm_runtime_put_noidle().
> 
> Yes, it looks good.

Cool, thanks!

> > So, your point is that while .suspend() or .resume() are running, the
> > synchronization between runtime PM and system suspend/resume should be the
> > subsystem's problem, right?
> 
> Almost but not quite.  I was talking about the time period between 
> .prepare() and .suspend() (and also the time period between .resume() 
> and .complete()).
> 
> It's probably okay to prevent pm_runtime_suspend() from working during
> .suspend() or .resume(), but it's not a good idea to prevent
> pm_runtime_resume() from working then.

OK, but taking a reference by means of pm_runtime_get_noresume() won't
block pm_runtime_resume().

> > I actually see a reason for doing this.  Namely, I don't really think
> > driver writers should be bothered with preventing races between different
> > PM callbacks from happening.  Runtime PM takes care of that at run time,
> > the design of the system suspend/resume code ensures that the callbacks
> > for the same device are executed sequentially, but if we allow runtime PM
> > callbacks to be executed in parallel with system suspend/resume callbacks,
> > someone has to prevent those callbacks from racing with each other.
> > 
> > Now, if you agree that that shouldn't be a driver's task, then it has to
> > be the subsystem's one and I'm not sure what a subsystem can do other than
> > disabling runtime PM or at least taking a reference on every device before
> > calling device drivers' .suspend() callbacks.
> > 
> > Please note, I think that .prepare() and .complete() are somewhat special,
> > so perhaps we should allow those to race with runtime PM callbacks, but IMO
> > allowing .suspend() and .resume() to race with .runtime_suspend() and
> > .runtime_resume() is not a good idea.
> 
> Races in the period after .suspend() and before .resume() will be 
> handled by disabling runtime PM when .suspend() returns and enabling it 
> before calling .resume().

OK

> During the .suspend and .resume callbacks, races with
> .runtime_suspend() can be prevented by calling
> pm_runtime_get_noresume() just before .suspend() and then calling
> pm_runtime_put_sync() just after .resume().

So, you seem to suggest to call pm_runtime_get_noresume() in
__device_suspend() and pm_runtime_put_sync() in device_resume().
That would be fine by me, perhaps up to the "sync" part of the "put".

> Races with .runtime_resume() can be handled to some extent by putting a
> runtime barrier immediately after the pm_runtime_get_noresume() call,
> but that's not a perfect solution.  Is it good enough?

It's not worse than what we had before, so I guess it should be enough.

> > > What I'm suggesting is to revert the commit but at the same time,
> > > move the get_noresume() into __device_suspend() and the put_sync() into 
> > > device_resume().
> > 
> > What about doing pm_runtime_get_noresume() and the pm_runtime_barrier()
> > in dpm_prepare(), but _after_ calling device_prepare() and doing
> > pm_runtime_put_noidle() in dpm_complete() _before_ calling .complete()
> > from the subsystem
> 
> This does not address the issue of allowing runtime suspends in the 
> windows between .prepare() - .suspend() and .resume() - .complete().

OK

> >  (a _put_sync() at this point will likely invoke
> > .runtime_idle() from the subsystem before executing .complete(), which may
> > not be desirable)?
> 
> It should be allowed.  The purpose of .complete() is not to re-enable
> runtime power management of the device; it is to release resources
> (like memory) allocated during .prepare() and perhaps also to allow new
> children to be registered under the device.

Right.  But does "allowed" mean the core _should_ do it at this point?
We may as well call pm_runtime_idle() directly from rpm_complete(), but
perhaps it's better to call it from device_resume(), so that it runs in
parallel for async devices.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/