lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 10 Mar 2022 00:00:25 +0000
From:   HORIGUCHI NAOYA(堀口 直也) 
        <naoya.horiguchi@....com>
To:     Yang Shi <shy828301@...il.com>
CC:     Naoya Horiguchi <naoya.horiguchi@...ux.dev>,
        Linux MM <linux-mm@...ck.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v1] mm/hwpoison: set PageHWPoison after taking page lock
 in memory_failure_hugetlb()

On Wed, Mar 09, 2022 at 01:55:30PM -0800, Yang Shi wrote:
> On Wed, Mar 9, 2022 at 1:15 AM Naoya Horiguchi
> <naoya.horiguchi@...ux.dev> wrote:
> >
> > From: Naoya Horiguchi <naoya.horiguchi@....com>
> >
> > There is a race condition between memory_failure_hugetlb() and hugetlb
> > free/demotion, which causes setting PageHWPoison flag on the wrong page
> > (which was a hugetlb when memory_failrue() was called, but was removed
> > or demoted when memory_failure_hugetlb() is called).  This results in
> > killing wrong processes.  So set PageHWPoison flag with holding page lock,
> >
> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@....com>
> > ---
> >  mm/memory-failure.c | 27 ++++++++++++---------------
> >  1 file changed, 12 insertions(+), 15 deletions(-)
> >
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index ac6492e36978..fe25eee8f9d6 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -1494,24 +1494,11 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
> >         int res;
> >         unsigned long page_flags;
> >
> > -       if (TestSetPageHWPoison(head)) {
> > -               pr_err("Memory failure: %#lx: already hardware poisoned\n",
> > -                      pfn);
> > -               res = -EHWPOISON;
> > -               if (flags & MF_ACTION_REQUIRED)
> > -                       res = kill_accessing_process(current, page_to_pfn(head), flags);
> > -               return res;
> > -       }
> > -
> > -       num_poisoned_pages_inc();
> > -
> >         if (!(flags & MF_COUNT_INCREASED)) {
> >                 res = get_hwpoison_page(p, flags);
> 
> I'm not an expert of hugetlb, I may be wrong. I'm wondering how this
> could solve the race? Is the below race still possible?
> 
> __get_hwpoison_page()
>   head = compound_head(page)
> 
> hugetlb demotion (1G --> 2M)
>   get_hwpoison_huge_page(head, &hugetlb);

Thanks for the comment.
I assume Miaohe's patch below introduces additional check to detect the
race.  The patch calls compound_head() for the raw error page again, so
the demotion case should be detected.  I'll make the dependency clear in
the commit log.

https://lore.kernel.org/linux-mm/20220228140245.24552-2-linmiaohe@huawei.com/

> 
> 
> Then the head may point to a 2M page, but the hwpoisoned subpage is
> not in that 2M range?
> 
> 
> >                 if (!res) {
> >                         lock_page(head);
> >                         if (hwpoison_filter(p)) {
> > -                               if (TestClearPageHWPoison(head))
> > -                                       num_poisoned_pages_dec();
> >                                 unlock_page(head);
> >                                 return -EOPNOTSUPP;
> >                         }
> > @@ -1544,13 +1531,16 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
> >         page_flags = head->flags;
> >
> >         if (hwpoison_filter(p)) {
> > -               if (TestClearPageHWPoison(head))
> > -                       num_poisoned_pages_dec();
> >                 put_page(p);
> >                 res = -EOPNOTSUPP;
> >                 goto out;
> >         }
> >
> > +       if (TestSetPageHWPoison(head))
> 
> And I don't think "head" is still the head you expected if the race
> happened. I think we need to re-retrieve the head once the page
> refcount is bumped and locked.

I think the above justification works for this.
When the kernel reaches this line, the hugepage is properly pinned without being
freed or demoted, so "head" is still pointing to the same head page as expected.

Thanks,
Naoya Horiguchi

> 
> > +               goto already_hwpoisoned;
> > +
> > +       num_poisoned_pages_inc();
> > +
> >         /*
> >          * TODO: hwpoison for pud-sized hugetlb doesn't work right now, so
> >          * simply disable it. In order to make it work properly, we need
> > @@ -1576,6 +1566,13 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
> >  out:
> >         unlock_page(head);
> >         return res;
> > +already_hwpoisoned:
> > +       unlock_page(head);
> > +       pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn);
> > +       res = -EHWPOISON;
> > +       if (flags & MF_ACTION_REQUIRED)
> > +               res = kill_accessing_process(current, page_to_pfn(head), flags);
> > +       return res;
> >  }
> >
> >  static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
> > --
> > 2.25.1
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ