[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180717063619.GB1346@hori1.linux.bs1.fc.nec.co.jp>
Date: Tue, 17 Jul 2018 06:36:19 +0000
From: Naoya Horiguchi <n-horiguchi@...jp.nec.com>
To: Dan Williams <dan.j.williams@...el.com>
CC: "linux-nvdimm@...ts.01.org" <linux-nvdimm@...ts.01.org>,
Jan Kara <jack@...e.cz>, Christoph Hellwig <hch@....de>,
Jérôme Glisse <jglisse@...hat.com>,
Matthew Wilcox <mawilcox@...rosoft.com>,
Ross Zwisler <ross.zwisler@...ux.intel.com>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v5 08/11] mm, memory_failure: Teach memory_failure()
about dev_pagemap pages
On Fri, Jul 13, 2018 at 05:28:05PM -0700, Dan Williams wrote:
> On Fri, Jul 13, 2018 at 1:52 AM, Naoya Horiguchi
> <n-horiguchi@...jp.nec.com> wrote:
> > On Wed, Jul 04, 2018 at 02:41:06PM -0700, Dan Williams wrote:
...
> >> +
> >> + /*
> >> + * Use this flag as an indication that the dax page has been
> >> + * remapped UC to prevent speculative consumption of poison.
> >> + */
> >> + SetPageHWPoison(page);
> >
> > The number of hwpoison pages is maintained by num_poisoned_pages,
> > so you can call num_poisoned_pages_inc()?
>
> I don't think we want these pages accounted in num_poisoned_pages().
> We have the badblocks infrastructure in libnvdimm to track how many
> errors and where they are located, and since they can be repaired via
> driver actions I think we should track them separately.
OK.
> > Related to this, I'm interested in whether/how unpoison_page() works
> > on a hwpoisoned dev_pagemap page.
>
> unpoison_page() is only triggered via freeing pages to the page
> allocator, and that never happens for dev_pagemap / ZONE_DEVICE pages.
sorry, my bad comment.
I meant unpoison_memory() in mm/memory-failure.c, which is triggered
via debugfs:hwpoison/unpoison-pfn. This interface looks like below
int unpoison_memory(unsigned long pfn)
{
struct page *page;
struct page *p;
int freeit = 0;
static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
DEFAULT_RATELIMIT_BURST);
if (!pfn_valid(pfn))
return -ENXIO;
p = pfn_to_page(pfn);
page = compound_head(p);
if (!PageHWPoison(p)) {
unpoison_pr_info("Unpoison: Page was already unpoisoned %#lx\n",
pfn, &unpoison_rs);
return 0;
}
...
so I think that we can add is_zone_device_page() check at the beginning
of this function to call hwpoison_clear() introduced in patch 13/13?
Otherwise maybe compound_head() will cause some critical issue like
general protection fault.
Thanks,
Naoya Horiguchi
Powered by blists - more mailing lists