[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200811193201.GA1410457@u2004>
Date: Wed, 12 Aug 2020 04:32:01 +0900
From: Naoya Horiguchi <nao.horiguchi@...il.com>
To: Qian Cai <cai@....pw>
Cc: HORIGUCHI NAOYA(堀口 直也)
<naoya.horiguchi@....com>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"mhocko@...nel.org" <mhocko@...nel.org>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"mike.kravetz@...cle.com" <mike.kravetz@...cle.com>,
"osalvador@...e.de" <osalvador@...e.de>,
"tony.luck@...el.com" <tony.luck@...el.com>,
"david@...hat.com" <david@...hat.com>,
"aneesh.kumar@...ux.vnet.ibm.com" <aneesh.kumar@...ux.vnet.ibm.com>,
"zeil@...dex-team.ru" <zeil@...dex-team.ru>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>,
"catalin.marinas@....com" <catalin.marinas@....com>,
"will@...nel.org" <will@...nel.org>
Subject: Re: [PATCH v6 00/12] HWPOISON: soft offline rework
On Tue, Aug 11, 2020 at 01:39:24PM -0400, Qian Cai wrote:
> On Tue, Aug 11, 2020 at 03:11:40AM +0000, HORIGUCHI NAOYA(堀口 直也) wrote:
> > I'm still not sure why the test succeeded by reverting these because
> > current mainline kernel provides similar mechanism to prevent reuse of
> > soft offlined page. So this success seems to me something suspicious.
> >
> > To investigate more, I want to have additional info about the page states
> > of the relevant pages after soft offlining. Could you collect it by the
> > following steps?
> >
> > - modify random.c not to run hotplug_memory() in migrate_huge_hotplug_memory(),
> > - compile it and run "./random 1" once,
> > - to collect page state with hwpoisoned pages, run "./page-types -Nlr -b hwpoison",
> > where page-types is available under tools/vm in kernel source tree.
> > - choose a few pfns of soft offlined pages from kernel message
> > "Soft offlining pfn ...", and run "./page-types -Nlr -a <pfn>".
>
> # ./page-types -Nlr -b hwpoison
> offset len flags
> 99a000 1 __________B________X_______________________
> 99c000 1 __________B________X_______________________
> 99e000 1 __________B________X_______________________
> 9a0000 1 __________B________X_______________________
> ba6000 1 __________B________X_______________________
> baa000 1 __________B________X_______________________
Thank you. It only shows 6 lines of records, which is unexpected to me
because random.c iterates soft offline 2 hugepages with madvise() 1000 times.
Somehow (maybe in arch specific way?) other hwpoisoned pages might be cleared?
If they really are, the success of this test is a fake, and this patchset
can be considered as a fix.
>
> Every single one of pfns was like this,
>
> # ./page-types -Nlr -a 0x99a000
> offset len flags
> 99a000 1 __________B________X_______________________
>
> # ./page-types -Nlr -a 0x99e000
> offset len flags
> 99e000 1 __________B________X_______________________
>
> # ./page-types -Nlr -a 0x99c000
> offset len flags
> 99c000 1 __________B________X_______________________
Powered by blists - more mailing lists