[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220131054432.GA856839@hori.linux.bs1.fc.nec.co.jp>
Date: Mon, 31 Jan 2022 05:44:35 +0000
From: HORIGUCHI NAOYA(堀口 直也)
<naoya.horiguchi@....com>
To: Matthew Wilcox <willy@...radead.org>
CC: David Rientjes <rientjes@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] mm/hwpoison: Check the subpage, not the head page
On Sun, Jan 30, 2022 at 09:14:21PM +0000, Matthew Wilcox wrote:
> On Sun, Jan 30, 2022 at 12:58:17PM -0800, David Rientjes wrote:
> > On Sun, 30 Jan 2022, Matthew Wilcox (Oracle) wrote:
> >
> > > Hardware poison is tracked on a per-page basis, not on the head page.
> > >
> > > Signed-off-by: Matthew Wilcox (Oracle) <willy@...radead.org>
> > > ---
> > > mm/rmap.c | 4 ++--
> > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/mm/rmap.c b/mm/rmap.c
> > > index 6a1e8c7f6213..09b08888120e 100644
> > > --- a/mm/rmap.c
> > > +++ b/mm/rmap.c
> > > @@ -1553,7 +1553,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> > > /* Update high watermark before we lower rss */
> > > update_hiwater_rss(mm);
> > >
> > > - if (PageHWPoison(page) && !(flags & TTU_IGNORE_HWPOISON)) {
> > > + if (PageHWPoison(subpage) && !(flags & TTU_IGNORE_HWPOISON)) {
> > > pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
> > > if (PageHuge(page)) {
> > > hugetlb_count_sub(compound_nr(page), mm);
> > > @@ -1873,7 +1873,7 @@ static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma,
> > > * memory are supported.
> > > */
> > > subpage = page;
> > > - } else if (PageHWPoison(page)) {
> > > + } else if (PageHWPoison(subpage)) {
> > > pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
> > > if (PageHuge(page)) {
> > > hugetlb_count_sub(compound_nr(page), mm);
> >
> > This looks correct. Correct me if I'm wrong that this is for consistency
> > and cleanup and that there is no bug being fixed by this, however.
>
> Oh, no, I think there's a real bug here. It's just that we're looking
> at an uncommon & hence rarely-tested scenario -- a memory fault in the
> middle of a THP (in mainline; obviously it'll be a little more common
> with arbitrary sized folios). I don't do HWPoison testing myself, so
> this was by inspection and not from testing. A scenario where things
> would go wrong is a memory error on a non-head-page would go unnoticed
> when migrating or unmapping. Contrariwise, if there's a hardware error
> on a head page, all the subpages get treated as poisoned, even though
> they shouldn't be.
Thank you for reporting. As you point out, the current check does not
handle thp properly. The reason of checking head page here is to handle
hwpoisoned hugetlb (which has PG_hwpoison on the head page even if the error
is on any of tail page). So I think that the proper fix is to add a helper
function to check page type (normal, thp, or hugetlb) as well as PageHWPoison.
Thanks,
Naoya Horiguchi
Powered by blists - more mailing lists