linux-kernel - Re: [RFC][PATCH] PM: Force GFP_NOIO during suspend/resume (was: Re: [linux-pm] Memory allocations in .suspend became very unreliable)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <201001202221.34804.rjw@sisk.pl>
Date:	Wed, 20 Jan 2010 22:21:34 +0100
From:	"Rafael J. Wysocki" <rjw@...k.pl>
To:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
Cc:	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Maxim Levitsky <maximlevitsky@...il.com>,
	linux-pm@...ts.linux-foundation.org,
	LKML <linux-kernel@...r.kernel.org>,
	"linux-mm" <linux-mm@...ck.org>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [RFC][PATCH] PM: Force GFP_NOIO during suspend/resume (was: Re: [linux-pm] Memory allocations in .suspend became very unreliable)

On Wednesday 20 January 2010, KOSAKI Motohiro wrote:
> > On Tuesday 19 January 2010, Benjamin Herrenschmidt wrote:
> > > On Tue, 2010-01-19 at 10:19 +0900, KOSAKI Motohiro wrote:
> > > > I think the race happen itself is bad. memory and I/O subsystem can't solve such race
> > > > elegantly. These doesn't know enough suspend state knowlege. I think the practical 
> > > > solution is that higher level design prevent the race happen.
> > > > 
> > > > 
> > > > > My patch attempts to avoid these two problems as well as the problem with
> > > > > drivers using GFP_KERNEL allocations during suspend which I admit might be
> > > > > solved by reworking the drivers.
> > > > 
> > > > Agreed. In this case, only drivers change can solve the issue. 
> > > 
> > > As I explained earlier, this is near to impossible since the allocations
> > > are too often burried deep down the call stack or simply because the
> > > driver doesn't know that we started suspending -another- driver...
> > > 
> > > I don't think trying to solve those problems at the driver level is
> > > realistic to be honest. This is one of those things where we really just
> > > need to make allocators 'just work' from a driver perspective.
> > > 
> > > It can't be perfect of course, as mentioned earlier, there will be a
> > > problem if too little free memory is really available due to lots of
> > > dirty pages around, but most of this can be somewhat alleviated in
> > > practice, for example by pushing things out a bit at suspend time,
> > > making some more memory free etc... But yeah, nothing replaces proper
> > > error handling in drivers for allocation failures even with
> > > GFP_KERNEL :-)
> > 
> > Agreed.
> > 
> > Moreover, I didn't try to do anything about that before, because memory
> > allocation problems during suspend/resume just didn't happen.  We kind of knew
> > they were possible, but since they didn't show up, it wasn't immediately
> > necessary to address them.
> > 
> > Now, however, people started to see these problems in testing and I'm quite
> > confident that this is a result of recent changes in the mm subsystem.  Namely,
> > if you read the Maxim's report carefully, you'll notice that in his test case
> > the mm subsystem apparently attempted to use I/O even though there was free
> > memory available in the system.  This is the case I want to prevent from
> > happening in the first place.
> 
> Hi Rafael,
> 
> Do you mean this is the unrelated issue of nVidia bug?

The nvidia driver _is_ buggy, but Maxim said he couldn't reproduce the
problem if all the allocations made by the nvidia driver during suspend
were changed to GFP_ATOMIC.

> Probably I haven't catch your point. I don't find Maxim's original bug
> report. Can we share the test-case and your analysis detail?

The Maxim's original report is here:
https://lists.linux-foundation.org/pipermail/linux-pm/2010-January/023982.html

and the message I'm referring to is at:
https://lists.linux-foundation.org/pipermail/linux-pm/2010-January/023990.html

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/