lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Wed, 22 Sep 2021 13:37:12 -0700
From:   Yang Shi <shy828301@...il.com>
To:     "Luck, Tony" <tony.luck@...el.com>
Cc:     HORIGUCHI NAOYA(堀口 直也) 
        <naoya.horiguchi@....com>, Oscar Salvador <osalvador@...e.de>,
        tdmackey@...tter.com, David Hildenbrand <david@...hat.com>,
        Matthew Wilcox <willy@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Jonathan Corbet <corbet@....net>,
        Linux MM <linux-mm@...ck.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [v2 PATCH 3/3] mm: hwpoison: dump page for unhandlable page

On Wed, Sep 22, 2021 at 12:58 PM Yang Shi <shy828301@...il.com> wrote:
>
> On Wed, Sep 22, 2021 at 12:37 PM Luck, Tony <tony.luck@...el.com> wrote:
> >
> > On Wed, Aug 18, 2021 at 10:41:16PM -0700, Yang Shi wrote:
> > > Currently just very simple message is shown for unhandlable page, e.g.
> > > non-LRU page, like:
> > > soft_offline: 0x1469f2: unknown non LRU page type 5ffff0000000000 ()
> > >
> > > It is not very helpful for further debug, calling dump_page() could show
> > > more useful information.
> >
> > Looks like your code already caught something. An error injection
> > test may have injected into a shared library. Though I'm not sure that
> > the refcount/mapcount in the dump agrees with that diagnosis from the
> > author of this test.
>
> The messages from dump_page() are (unwind them from mce logs):
>
> [ 4817.630520] page:000000003ab9dca4 refcount:1 mapcount:0
> mapping:0000000000000000 index:0x0 pfn:0xcef2747
> [ 4817.646860] flags:
> 0x57ffffc0801000(reserved|hwpoison|node=1|zone=2|lastcpupid=0x1fffff)
> [ 4818.033689] raw: 0057ffffc0801000 ffd400033bc9d1c8 ffd400033bc9d1c8
> 0000000000000000
> [ 4818.280640] raw: 0000000000000000 0000000000000000 00000001ffffffff
> 0000000000000000

Missed one line from the dump:

[ 4818.321804] page dumped because: hwpoison: unhandlable page

Anyway dump_page() is just called when unhandlable page is met.

>
> The page flags tell it is a "reserved" page and mapping is NULL. It
> doesn't seem like a user page or movable page, so hwpoision can't
> handle it so that the messages are dumped.
>
> >
> > Here's what appeared on the console:
> >
> > [ 4817.622254] mce: Uncorrected hardware memory error in user-access at cef2747000
> > [ 4817.630520] page:000000003ab9dca4 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xcef2747
> > [ 4817.638651] mce: Uncorrected hardware memory error in user-access at cef2747000
> > [ 4817.646860] flags: 0x57ffffc0801000(reserved|hwpoison|node=1|zone=2|lastcpupid=0x1fffff)
> > [ 4818.025515] mce: Uncorrected hardware memory error in user-access at cef2747000
> > [ 4818.033689] raw: 0057ffffc0801000 ffd400033bc9d1c8 ffd400033bc9d1c8 0000000000000000
> > [ 4818.272435] mce: Uncorrected hardware memory error in user-access at cef2747000
> > [ 4818.280640] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
> > [ 4818.280658] mce: Uncorrected hardware memory error in user-access at cef2747000
> > [ 4818.313606] mce: Uncorrected hardware memory error in user-access at cef2747000
> > [ 4818.321804] page dumped because: hwpoison: unhandlable page
> > [ 4818.564802] mce: Uncorrected hardware memory error in user-access at cef2747000
> > [ 4818.573043] Memory failure: 0xcef2747: recovery action for unknown page: Ignored
> > [ 4818.595837] Memory failure: 0xcef2747: already hardware poisoned
> > [ 4818.603245] Memory failure: 0xcef2747: Sending SIGBUS to multichase:67460 due to hardware memory corruption
> > [ 4818.614297] Memory failure: 0xcef2747: already hardware poisoned
> >
> > -Tony

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ