lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210225123806.GA15006@hori.linux.bs1.fc.nec.co.jp>
Date:   Thu, 25 Feb 2021 12:38:06 +0000
From:   HORIGUCHI NAOYA(堀口 直也) 
        <naoya.horiguchi@....com>
To:     Oscar Salvador <osalvador@...e.de>
CC:     Aili Yao <yaoaili@...gsoft.com>,
        "tony.luck@...el.com" <tony.luck@...el.com>,
        "david@...hat.com" <david@...hat.com>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "bp@...en8.de" <bp@...en8.de>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "hpa@...or.com" <hpa@...or.com>, "x86@...nel.org" <x86@...nel.org>,
        "inux-edac@...r.kernel.org" <inux-edac@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "yangfeng1@...gsoft.com" <yangfeng1@...gsoft.com>
Subject: Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

On Thu, Feb 25, 2021 at 12:39:30PM +0100, Oscar Salvador wrote:
> On Thu, Feb 25, 2021 at 11:28:18AM +0000, HORIGUCHI NAOYA(堀口 直也) wrote:
> > Hi Aili,
> > 
> > I agree that this set_mce_nospec() is not expected to be called for
> > "already hwpoisoned" page because in the reported case the error
> > page is already contained and no need to resort changing cache mode.
> 
> Out of curiosity, what is the current behavour now?
> Say we have an ongoing MCE which has marked the page as HWPoison but
> memory_failure did not take any action on the page yet.
> And then, we have another MCE, which ends up there.
> set_mce_nospec might clear _PAGE_PRESENT bit.
> 
> Does that have any impact on the first MCE?

Hi Oscar,

Thank you for shedding light on this, this race looks worrisome to me.
We call try_to_unmap() inside memory_failure(), where we find affected
ptes by page_vma_mapped_walk() and convert into hwpoison entires in
try_to_unmap_one().  So there seems two racy cases:

  1)
     CPU 0                          CPU 1
     page_vma_mapped_walk
                                    clear _PAGE_PRESENT bit
       // skipped the entry

  2)
     CPU 0                          CPU 1
     page_vma_mapped_walk
       try_to_unmap_one
                                    clear _PAGE_PRESENT bit
         convert the entry
         set_pte_at

In case 1, the affected processes get signals on later access,
so although the info in SIGBUS could be different, that's OK.
And we have no impact in case 2.

Thanks,
Naoya Horiguchi

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ