linux-kernel - Re: Regression from 2.6.26: Hibernation (possibly suspend) broken on Toshiba R500 (bisected)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <200812041900.27514.elendil@planet.nl>
Date:	Thu, 4 Dec 2008 19:00:25 +0100
From:	Frans Pop <elendil@...net.nl>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	"Rafael J. Wysocki" <rjw@...k.pl>, Greg KH <greg@...ah.com>,
	Ingo Molnar <mingo@...e.hu>, jbarnes@...tuousgeek.org,
	lenb@...nel.org,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	tiwai@...e.de, Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: Regression from 2.6.26: Hibernation (possibly suspend) broken on Toshiba R500 (bisected)

On Thursday 04 December 2008, Linus Torvalds wrote:
> On Thu, 4 Dec 2008, Frans Pop wrote:
> > I've given your patch a try and the few resumes from STR I've done
> > were all successful. That's not 100% conclusive yet, but a nice
> > start. Some info from logs etc. below.
>
> Ok, but I thought you had a hard time reproducing this _anyway_, even
> with just plain -rc7. No?

Well, I had a failure rate of about 1 in 5-10 resumes originally.
See: http://bugzilla.kernel.org/show_bug.cgi?id=11545

Then I found the 2 workarounds and *with those in place* I got almost 100% 
reliable resumes. Now I've removed those workarounds and with either the 
revert or your oneliner I still get 100% success.
>From my PoV that is a very definite improvement: the machine now "feels" a 
hell of a lot more reliable for critical use.

So I _could_ reproduce it reliably given enough suspend/resume cycles.
But I guess this does support your suspicion that it may be a timing 
issue: if the timing happens to be right, the resume succeeds; if it's 
wrong I get a dead box.

> Since it's apparently STR, has anybody gotten _anything_ sane out of
> trying to enable PM_TRACE_RTC, and then doing that
>
> 	echo 1 > /sys/power/pm_trace

I did try that at the beginning. That's how I ended up removing e1000e 
before suspend. See http://bugzilla.kernel.org/show_bug.cgi?id=11545.

My next hint was that Matthew Garret, who has the same notebook, was 
surprised at my resume problems as he did not see them. So I did a 
comparison of our kernel configs and made some changes to mine. From
that I found that a very low value for SND_HDA_POWER_SAVE_DEFAULT (5) 
reduced the failure rate to practically zero.

At some point I tried keeping e1000e loaded for a bit, but that quickly 
gave me a failure again, so I starting removing it again during suspend.

So I did have some data, but as I got no response on my BR I had no idea 
where to go from there. I was really very happy to see Rafael's mail as 
his description almost exactly matched what I had been seeing.

I'd be happy to run with unpatched kernels for a while and do some more 
pm_traces, but only if someone is going to follow up and interpret the 
results for me or provide suggestions for targeted additional debugging.

Cheers,
FJP
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/