linux-kernel - Re: [patch 0/2] suspend/resume regression fixes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.0.999.0709221546250.16478@woody.linux-foundation.org>
Date:	Sat, 22 Sep 2007 15:59:25 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Thomas Gleixner <tglx@...utronix.de>
cc:	Andrew Morton <akpm@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...e.hu>,
	Len Brown <lenb@...nel.org>,
	Venkatesh Pallipadi <venkatesh.pallipadi@...el.com>,
	"Rafael J. Wysocki" <rjw@...k.pl>
Subject: Re: [patch 0/2] suspend/resume regression fixes

On Sat, 22 Sep 2007, Thomas Gleixner wrote:
> 
> My final enlightment was, when I removed the ACPI processor module,
> which controls the lower idle C-states, right before resume; this
> worked fine all the time even without all the workaround hacks.
> 
> I really hope that this two patches finally set an end to the "jinxed
> VAIO heisenbug series", which started when we removed the periodic
> tick with the clockevents/dyntick patches.

Ok, so the patches look fine, but I somehow have this slight feeling that 
you gave up a bit too soon on the "*why* does this happen?" question.

I realize that the answer is easily "because ACPI screwed up", but I'm 
wondering if there's something we do to trigger that screw-up.

In particular, I also suspect that this may not really fix the problem - 
maybe it just makes the window sufficiently small that it no longer 
triggers. Because we don't necessarily understand what the real background 
for the problem is, I'm not sure we can say that it is solved.

The reason I say this is that I have a suspicion on what triggers it.

I suspect that the problem is that we do

	pm_ops->prepare();
	disable_nonboot_cpus()
	suspend_enter();
	enable_nonboot_cpus()
	pm_finish()

and here the big thing to notice is that "pm_ops->prepare()" call, which 
sets the wakup vector etc etc.

So maybe the real problem here is that once we've done the "->prepare()" 
call and ACPI has set up various stuff, we MUST NOT do any calls to any 
ACPI routines to set low-power states, because the stupid firmware isn't 
expecting it.

Now, if this is the cause, then I think your patch should indeed fix it, 
since you get called by the early-suspend code (which happens *before* the 
"->prepare()" call), but at the same time, I wonder if maybe it would be 
slightly "more correct" to instead of using the suspend/resume callbacks, 
simply do this in the "acpi_pm_prepare()" stage, since that is likely the 
thing that triggers it?

But hey, I think I'll apply the patches as-is. I'd just feel even better 
if we actually understood *why* doing the CPU Cx states is not something 
we can do around the suspend code!

		Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/