linux-kernel - Re: Unreliable hibernation on Lenovo x230 (regression)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Fri, 3 Apr 2015 18:00:26 +0200
From:	rhn <kebuac.rhn@...cupinefactory.org>
To:	joeyli <jlee@...e.com>
Cc:	rhn <kebuac.rhn@...cupinefactory.org>, Pavel Machek <pavel@....cz>,
	kernel list <linux-kernel@...r.kernel.org>,
	joeyli.kernel@...il.com, linux-pm@...r.kernel.org,
	"Rafael J. Wysocki" <rjw@...ysocki.net>
Subject: Re: Unreliable hibernation on Lenovo x230 (regression)

On Fri, 3 Apr 2015 09:23:35 +0800
joeyli <jlee@...e.com> wrote:

> On Thu, Apr 02, 2015 at 08:12:00PM +0200, rhn wrote:
> > On Fri, 3 Apr 2015 01:22:21 +0800
> > joeyli <jlee@...e.com> wrote:
> > 
> > > On Fri, Apr 03, 2015 at 12:50:54AM +0800, joeyli wrote:
> > > > Hi, 
> > > > 
> > > > On Thu, Apr 02, 2015 at 05:28:05PM +0200, Pavel Machek wrote:
> > > > > On Wed 2015-04-01 21:47:43, rhn wrote:
> > > > > > Hello,
> > > > > > 
> > > > > > Between kernel 3.16 and 3.17, a regression has been introduced where the first hibernation after regular shutdown always fails to resume. Subsequent hibernations succeed.
> > > > > > 
> > > > > > The system is a Lenovo x230 with Intel i5, booting with EFI, with the hibernate partition located on a secondary SSD drive. Installed system is Fedora 20, hibernation and reboots were issued using the KDE shutdown dialog.
> > > > > > 
> > > > > > I have tracked the problem to first appear in the commit
> > > > > > e67ee10190e69332f929bdd6594a312363321a66	Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'
> > > > > > 
> > > > > > The problem itself manifests in dmesg as follows (system was first
> > > > > > restarted, then hibernated - this log is from the subsequent
> > > > > resume):
> > > > > 
> > > > > Ok, can you try to disable cpufreq and cpuidle, and then try if it
> > > > > reproduces?
> > > > > 
> > > > > At that point, this is the candidate:
> > > > > 
> > > > > commit e67ee10190e69332f929bdd6594a312363321a66
> > > > > Merge: 21c806d 84c91b7 39c8bba 372ba8c
> > > > > Author: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> > > > > Date:   Mon Aug 11 23:19:48 2014 +0200
> > > > > 
> > > > >     Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'
> > > > > 
> > > > >     * pm-sleep:
> > > > >           PM / hibernate: avoid unsafe pages in e820 reserved regions
> > > > > 
> > > > > ...
> > > > > Alternatively, you can just try to revert
> > > > > 
> > > > > commit 84c91b7ae07c62cf6dee7fde3277f4be21331f85
> > > > > Author: Lee, Chun-Yi <joeyli.kernel@...il.com>
> > > > > Date:   Mon Aug 4 23:23:21 2014 +0800
> > > > > 
> > > > >     PM / hibernate: avoid unsafe pages in e820 reserved regions
> > > > > 
> > > > >     When the machine doesn't well handle the e820 persistent when
> > > > >     hibernate
> > > > >         resuming, then it may cause page fault when writing image to
> > > > >     snapshot
> > > > >         buffer:
> > > > > 
> > > > > 
> > > > > ...
> > > > > 
> > > > > Thanks,
> > > > > 									Pavel
> > > > 
> > > > Before revert 84c91b7ae patch, please check does there have log similar as
> > > > following in dmesg when hibernate resume fail?
> > > > 
> > > > [   24.349777] PM: 0xab9bc000 in e820 nosave region: [mem 0xab9bc000-0xab9c2fff]
> > > > 
> > > > The address may different, by you should see "e820 nosave region" log. Otherwise
> > > > we got another problem.
> > > >
> > > 
> > > Forgot to mention, please add "debug no_console_suspend=1 loglevel=9" to kernel
> > > parameter then try to reproduce issue and look at dmesg.
> > > 
> > > 
> > > Thanks a lot!
> > > Joey Lee 
> > 
> > Yes, it's present in dmesg when hibernate fails (default kernel params):
> > [    3.138824] PM: 0x9d3d3000 in e820 nosave region: [mem 0x9d3d3000-0x9d3d3fff]
> >
> 
> OK, then the message means 0x9d3d3000 address used by image kernel but in e820
> region of current boot. Need check does this e820 region used by setup_data so
> reserved as E820_RESERVED_KERN.
> 
> Need your complete dmesg to verify the e820 table. If the above assumption is
> true, then Yinghai Lu's patchset could fix this problem:
> 
> x86: Kill E820_RESERVED_KERN
> https://lkml.org/lkml/2015/3/4/434
> 
> The target kernel version to merge his patches is v4.1
>  
> > I probably didn't make it clear - the top dmesg in my original message was from failed resume.
> > 
> > Cheers,
> > rhn
> 
> On the other hand,
> Could you please check you are using platform mode to turn off machine for
> hibernating?
> 
> $ cat /sys/power/disk
> [platform] shutdown reboot suspend
> 
> And, if possible, please file bug on bugzilla.kernel.org and give me the bug
> number. I prefer collect log and debugging history in bugzilla for further
> tracking.
> 
> 
> Thanks a lot!
> Joey Lee

Yes, platform mode was used in all instances - both working and broken kernels.

I included full dmesg in the bug report on bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=96111

Cheers,
rhn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/