linux-kernel - Re: [PATCH] [v4] PM / hibernate: Fix hibernation panic caused by inconsistent e820 map

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150917204343.GA14658@amd>
Date:	Thu, 17 Sep 2015 22:43:43 +0200
From:	Pavel Machek <pavel@....cz>
To:	"Chen, Yu C" <yu.c.chen@...el.com>
Cc:	"rjw@...ysocki.net" <rjw@...ysocki.net>,
	"Brown, Len" <len.brown@...el.com>,
	"linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"Zhang, Rui" <rui.zhang@...el.com>,
	"jlee@...e.com" <jlee@...e.com>,
	"joeyli.kernel@...il.com" <joeyli.kernel@...il.com>,
	"yinghai@...nel.org" <yinghai@...nel.org>,
	Ingo Molnar <mingo@...nel.org>
Subject: Re: [PATCH] [v4] PM / hibernate: Fix hibernation panic caused by
 inconsistent e820 map

Hi!

> > > This is also because BIOS provides different e820 memory map 
> > > before/after hibernation, and linux regards it as invalid process 
> > > and refuses to resume, in order to protect against data corruption.
> > > However, this check might be too strict, consider the following scenario:
> > 
> > Well... yes, the check is strict, but why is BIOS doing that? Can you 
> > fix it instead?
> Humm, I sync with BIOS team, then  I got the answer that,  
> the e820 map is allocated dynamically each time it boots up,
> so it is poissible BIOS shows different map each time it boots up.
> Currently our BIOS team is working on it, but some problematic BIOS
> have already be released, so I think Linux should deal with this situation.

Well.. you can't really deal with the situation. That's what confuses
me. If original kernel uses memory that is "not present" now, there's
nothing you can do... but panic / fail resume.

> > >    With v3 patch applied, I did 30 cycles on my problematic platform,
> > >    no panic triggered anymore(50% reproducible before patched, by
> > >    plugging/unplugging memory peripheral during hibernation), and it
> > >    just warns of invalid pages.
> > 
> > "Just warns of invalid pages". Do you want to say that you "just cause data
> > corruption"?
> > 
> > If you don't have enough memory, YOU DON'T RESTORE. Disks were synced,
> > so not restoring is safe. Running with memory corruption is NOT.
> >
> Sorry, I do not quite understand this scenario, do you mean:
> "Without this patch , the checking of memory consistency is at a early stage,
> just before the actual pages restoring,so it's a safe time for system to determin 
> restore or terminate.
> And with this patch applied, the checking will be put off to a later stage, which
> is not safe when memory is low?"
> 
> I think in this patch, the  memory size checking has been moved 
> a little later than Its original place, the checking is still before the 
> actual  restoring image data pages:
> It happens once the last meta_page has been readed:
> prepare_image->mark_unsafe_pages  (before the actual restoing of
>    data pages)

Aha, so you should still see some failures in the testing.



> > > +	if (!swsusp_page_is_valid(pfn_to_page(pfn))) {
> > > +		pr_err(
> > > +		"PM: Hibernation failed, address %#010llx to restored not
> > valid!\n",
> > > +			(unsigned long long) pfn << PAGE_SHIFT);
> > 
> > ...and still bad english.
> > 
> Oh, will fix it: 
> PM: Hibernation failed, address %#010llx to be restored is not  valid!
> 
> Hope to hear from you, thanks!

Yes, that's better.

But I still don't like the patch.

0) BIOS is broken, and this does not completely work around it. Users
will still see the failed hibernation when the memory that is now
unavailable was actually used.

1) It allocates bm3 even on systems that don't need the workaround
(arm, ia32)

2) If you use hibernation on 32-bit kernel on affected system, you'll
still get panic.

3) I'm not sure I understand the changelog correctly. What happens
when BIOS reports less memory on hibernation? Will you magically
remove memory from kernel at runtime? Will /proc/meminfo be invalid
after resume? Will all the memory management tuning need fixing?

Changelog is really confusing. "failor" is not a english word.

After this patch applied, the panic will be replaced with the warning:

...

according to your explanation, panic will be replaced with the resume
failure, not mere warning.

I believe we have case of "this BIOS problem can not be reasonably
worked around" here.

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/