lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220308065658.GA610534@hori.linux.bs1.fc.nec.co.jp>
Date:   Tue, 8 Mar 2022 06:56:59 +0000
From:   HORIGUCHI NAOYA(堀口 直也) 
        <naoya.horiguchi@....com>
To:     Mike Kravetz <mike.kravetz@...cle.com>
CC:     Miaohe Lin <linmiaohe@...wei.com>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/4] mm/memory-failure.c: fix race with changing page
 compound again

On Mon, Mar 07, 2022 at 11:07:32AM -0800, Mike Kravetz wrote:
...
> > 
> >> +
> >> +       /**
> >> +        * The page could have changed compound pages due to race window.
> >> +        * If this happens just bail out.
> >> +        */
> >> +       if (!PageHuge(p) || compound_head(p) != head) {
> >> +               action_result(pfn, MF_MSG_DIFFERENT_COMPOUND, MF_IGNORED);
> >> +               res = -EBUSY;
> >> +               goto out;
> >> +       }
> > 
> > Let me have one comment on the diff. The result code MF_MSG_DIFFERENT_COMPOUND
> > might not fit when PageHuge is false in the check (because it's no longer a
> > compound page).  Maybe you may invent another result code, or changes
> > MF_MSG_DIFFERENT_COMPOUND (for example) to MF_MSG_DIFFERENT_PAGE_SIZE?
> > 
> 
> Suppose we do encounter this race.  Also, suppose p != head.
> At the beginning of memory_failure_hugetlb, we do:
> 
> struct page *head = compound_head(p);
> ...
> if (TestSetPageHWPoison(head))
> 
> So, it could be that we set Poison in the 'head' page but the error was really
> in another page.  Is that correct?
> 
> Now with the race, head is not a huge page and the pages could even be on
> buddy.  Does this mean we could have poison set on the wrong page in buddy?

Correct, the race might be rare, but this needs a fix.
I think that setting PageHWPoison first (before taking refcount and page lock)
is the root of all related problems.  This behavior came from the original
concept in hwpoison that preventing consumption of corrupted data is the first
priority.  But now I think that this makes no sense if we have this kind of bugs.

I'll try to write a patch for this (I only fix memory_failure_hugetlb() first,
but generic path should be fixed later).
Thank you for pointing out.

- Naoya Horiguchi

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ