lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 24 Aug 2016 09:36:10 +0800
From:   joeyli <jlee@...e.com>
To:     Chen Yu <yu.c.chen@...el.com>
Cc:     rjw@...ysocki.net, pavel@....cz, len.brown@...el.com,
        hpa@...or.com, mingo@...hat.com, tglx@...utronix.de,
        rui.zhang@...el.com, x86@...nel.org, linux-pm@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH][v6] PM / hibernate: Print the possible panic reason when
 resuming with inconsistent e820 map

On Tue, Aug 23, 2016 at 06:01:55PM +0800, Chen Yu wrote:
> Hi,
> thanks for your interest :)
> On Tue, Aug 23, 2016 at 05:45:27PM +0800, joeyli wrote:
> > Hi all, 
> > 
> > On Wed, Oct 21, 2015 at 01:21:40PM +0800, Chen Yu wrote:
> > > On some platforms, there is occasional panic triggered when trying to
> > > resume from hibernation, a typical panic looks like:
> > > 
> > > "BUG: unable to handle kernel paging request at ffff880085894000
> > > IP: [<ffffffff810c5dc2>] load_image_lzo+0x8c2/0xe70"
> > > 
> > > This is because e820 map has been changed by BIOS before/after
> > > hibernation, and one of the page frames from first kernel
> > > is right located in second kernel's unmapped region, so panic
> > > comes out when accessing unmapped kernel address.
> > > 
> > > In order to tell the user why this happeneded, and for scalability,
> > > we introduce a framework(a new file named hibernation_e820.c) to
> > > compare the e820 maps before/after hibernation. If these two
> > > e820 maps are not compatible with each other, we will print
> > > warning about the first corrupt e820 entry's information
> > > (there might be more than one broken e820 entries) once the
> > > system goes into panic, for example:
> > > 
> > > BUG: unable to handle kernel paging request at ffff8800a9688000
> > > IP: [<ffffffff810c5dc2>] load_image_lzo+0x8c2/0xe70
> > > PM: Hibernation Caution! Oops might be due to inconsistent e820 table.
> > > PM: mem [0xa963b000-0xa963d000][ACPI Table] is an invalid old e820 region.
> > > PM: Inconsistent with current [mem 0xa963b000-0xa963e000][ACPI Table].
> > > PM: Please update your BIOS, or do not use hibernation on this machine.
> > > 
> > > The following kind of e820 entries will be regarded as invalid ones:
> > > 1.E820_RAM:  old region is not a subset of any current region.
> > > 2.E820_ACPI: old region is not strictly the same as any current
> > >              region(example above).
> > > 
> > > Signed-off-by: Chen Yu <yu.c.chen@...el.com>
> > > ---
> > > v6:
> > >  - Fix some compiling errors reported by 0day/LKP, adjust
> > >    Kconfig/variable namings.
> > > v5:
> > >  - Rewrite this patch to just warn user of the broken BIOS
> > >    when panic.
> > > v4:
> > >  - Add __attribute__ ((unused)) for swsusp_page_is_valid,
> > >    to eliminate the warnning of:
> > >    'swsusp_page_is_valid' defined but not used
> > >    on non-x86 platforms.
> > > 
> > > v3:
> > >  - Adjust the logic to exclude the end_pfn boundary in pfn_mapped
> > >    when invoking mark_valid_pages, because the end_pfn is not
> > >    a mapped page frame, we should not regard it as a valid page.
> > > 
> > >    Move the sanity check of valid pages to a early stage in resuming
> > >    process(moved to mark_unsafe_pages), in this way, we can avoid
> > >    unnecessarily accessing these invalid pages in later stage(yes,
> > >    move to the original position Joey once introduced in:
> > >    Commit 84c91b7ae07c ("PM / hibernate: avoid unsafe pages in e820
> > >    reserved regions")
> > > 
> > >    With v3 patch applied, I did 30 cycles on my problematic platform,
> > >    no panic triggered anymore(50% reproducible before patched, by
> > >    plugging/unplugging memory peripheral during hibernation), and it
> > >    just warns of invalid pages.
> > >    
> > > v2:
> > >  - According to Ingo's suggestion, rewrite this patch.
> > > 
> > >    New version just checks each page frame according to pfn_mapped array.
> > >    So that we do not need to touch existing code related to
> > >    E820_RESERVED_KERN. And this method can naturely guarantee
> > >    that the system before/after hibernation do not need to be of
> > >    the same memory size on x86_64.
> > 
> > What's the progress of this patch? Looks already have experts review it.
> > Why this patch didn't accept?
> This patch is a little overkilled, and I have saved another simpler
> version to only check the md5 hash (as people suggested) for it. I can post it later.
> 
> thanks,
> Yu

I am happy to test and review it.

Thanks a lot!
Joey Lee

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ