lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7552351.MYV7GZcH0A@vostro.rjw.lan>
Date:   Tue, 30 Aug 2016 14:04:05 +0200
From:   "Rafael J. Wysocki" <rjw@...ysocki.net>
To:     Pavel Machek <pavel@....cz>
Cc:     "Rafael J. Wysocki" <rafael@...nel.org>,
        Borislav Petkov <bp@...en8.de>, Chen Yu <yu.c.chen@...el.com>,
        Linux PM <linux-pm@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        the arch/x86 maintainers <x86@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Lee@...gul.tnic, Chun-Yi <jlee@...e.com>
Subject: Re: [PATCH][v8] PM / hibernate: Verify the consistent of e820 memory map by md5 value

On Monday, August 29, 2016 05:13:34 PM Pavel Machek wrote:
> On Mon 2016-08-29 15:41:34, Rafael J. Wysocki wrote:
> > On Mon, Aug 29, 2016 at 6:59 AM, Borislav Petkov <bp@...en8.de> wrote:
> > > On Mon, Aug 29, 2016 at 12:35:40AM +0800, Chen Yu wrote:
> > >> On some platforms, there is occasional panic triggered when trying to
> > >> resume from hibernation, a typical panic looks like:
> > >>
> > >> "BUG: unable to handle kernel paging request at ffff880085894000
> > >> IP: [<ffffffff810c5dc2>] load_image_lzo+0x8c2/0xe70"
> > >>
> > >> This is because e820 map has been changed by BIOS across
> > >> hibernation, and one of the page frames from first kernel
> > >> is right located in second kernel's unmapped region, so panic
> > >> comes out when accessing unmapped kernel address.
> > >>
> > >> In order to expose this issue earlier, the md5 hash of e820 map
> > >> is passed from suspend kernel to resume kernel, and the system will
> > >> trigger panic once it finds the md5 value of previous kernel is not
> > >> the same as current resume kernel.
> > >
> > > ... so basically now even the cases where it managed to resume would
> > > panic because the digests differ, even if the original panic condition
> > > doesn't trigger the bug, i.e. your Note 1 below.
> > >
> > > The more important question IMHO would be, can we resume our system
> > > successfully *even* if BIOS fiddled with the e820 map?
> > >
> > > We'd still warn the hell out of it and even make that the md5 digest
> > > comparison a default-enabled thing without even having a config option
> > > to disable it but can we try harder not to panic and deal with this next
> > > BIOS f*ckup more intelligently than throwing our hands in the air and
> > > giving up?
> > 
> > We need not panic in principle and I wouldn't do that.
> > 
> > I would warn and try to continue regardless (which was the original
> > plan here AFAICS), or we change a possible data loss into a guaranteed
> > one.
> > 
> > IMO it is sufficient to give up when a PFN we have image data for is
> > not pfn_valid() during resume, which we do already.
> 
> Well... can you guarantee what will be effect of resuming with
> different memory map?
> 
> Because there's big difference between panic and trying to continue
> with corrupted memory.

If all of the page frames the image kernel used before hibernation are
available during resume as well, memory won't really get corrupted, at least
not right away.

There may be problems going forward, but whether or not they actually happen
depends on what the differences are.  So while an e820 mismatch indicates that
things may go wrong, it doesn't necessarily mean that they will.

Also, that panic() may cause hibernation to stop working in a sort of hard and
nasty way where it used to work flawlessly previously and that would be a
regression, so not really acceptable.

Thanks,
Rafael

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ