[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YUuGB6pJZRlE4yPb@agluck-desk2.amr.corp.intel.com>
Date: Wed, 22 Sep 2021 12:37:43 -0700
From: "Luck, Tony" <tony.luck@...el.com>
To: Yang Shi <shy828301@...il.com>
Cc: naoya.horiguchi@....com, osalvador@...e.de, tdmackey@...tter.com,
david@...hat.com, willy@...radead.org, akpm@...ux-foundation.org,
corbet@....net, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [v2 PATCH 3/3] mm: hwpoison: dump page for unhandlable page
On Wed, Aug 18, 2021 at 10:41:16PM -0700, Yang Shi wrote:
> Currently just very simple message is shown for unhandlable page, e.g.
> non-LRU page, like:
> soft_offline: 0x1469f2: unknown non LRU page type 5ffff0000000000 ()
>
> It is not very helpful for further debug, calling dump_page() could show
> more useful information.
Looks like your code already caught something. An error injection
test may have injected into a shared library. Though I'm not sure that
the refcount/mapcount in the dump agrees with that diagnosis from the
author of this test.
Here's what appeared on the console:
[ 4817.622254] mce: Uncorrected hardware memory error in user-access at cef2747000
[ 4817.630520] page:000000003ab9dca4 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xcef2747
[ 4817.638651] mce: Uncorrected hardware memory error in user-access at cef2747000
[ 4817.646860] flags: 0x57ffffc0801000(reserved|hwpoison|node=1|zone=2|lastcpupid=0x1fffff)
[ 4818.025515] mce: Uncorrected hardware memory error in user-access at cef2747000
[ 4818.033689] raw: 0057ffffc0801000 ffd400033bc9d1c8 ffd400033bc9d1c8 0000000000000000
[ 4818.272435] mce: Uncorrected hardware memory error in user-access at cef2747000
[ 4818.280640] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 4818.280658] mce: Uncorrected hardware memory error in user-access at cef2747000
[ 4818.313606] mce: Uncorrected hardware memory error in user-access at cef2747000
[ 4818.321804] page dumped because: hwpoison: unhandlable page
[ 4818.564802] mce: Uncorrected hardware memory error in user-access at cef2747000
[ 4818.573043] Memory failure: 0xcef2747: recovery action for unknown page: Ignored
[ 4818.595837] Memory failure: 0xcef2747: already hardware poisoned
[ 4818.603245] Memory failure: 0xcef2747: Sending SIGBUS to multichase:67460 due to hardware memory corruption
[ 4818.614297] Memory failure: 0xcef2747: already hardware poisoned
-Tony
Powered by blists - more mailing lists