lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7362f9ee-81fa-702a-7a03-1a91ecf0b58e@oracle.com>
Date:   Wed, 16 Mar 2022 15:51:35 -0700
From:   Mike Kravetz <mike.kravetz@...cle.com>
To:     Naoya Horiguchi <naoya.horiguchi@...ux.dev>, linux-mm@...ck.org
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Yang Shi <shy828301@...il.com>,
        Naoya Horiguchi <naoya.horiguchi@....com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4] mm/hwpoison: fix race between hugetlb free/demotion
 and memory_failure_hugetlb()

On 3/16/22 05:07, Naoya Horiguchi wrote:
> From: Miaohe Lin <linmiaohe@...wei.com>
> 
> There is a race condition between memory_failure_hugetlb() and hugetlb
> free/demotion, which causes setting PageHWPoison flag on the wrong page.
> The one simple result is that wrong processes can be killed, but another
> (more serious) one is that the actual error is left unhandled, so no one
> prevents later access to it, and that might lead to more serious results
> like consuming corrupted data.
> 
> Think about the below race window:
> 
>   CPU 1                                   CPU 2
>   memory_failure_hugetlb
>   struct page *head = compound_head(p);
>                                           hugetlb page might be freed to
>                                           buddy, or even changed to another
>                                           compound page.
> 
>   get_hwpoison_page -- page is not what we want now...
> 
> The compound_head is called outside hugetlb_lock, so the head is not
> reliable.
> 
> So set PageHWPoison flag after passing prechecks. And to detect
> potential violation, this patch also introduces a new action type
> MF_MSG_DIFFERENT_PAGE_SIZE.

Thanks for squashing these patches.

In my testing, there is a change in behavior that may not be intended.

My test strategy is:
- allocate two hugetlb pages
- create a mapping which reserves those two pages, but does not fault them in
  - as a result, the pages are on the free list but can not be freed
- inject error on a subpage of one of the huge pages
  - echo 0xYYY > /sys/kernel/debug/hwpoison/corrupt-pfn
- memory error code will call dissolve_free_huge_page
  - dissolve_free_huge_page returns -EBUSY because
    h->free_huge_pages - h->resv_huge_pages == 0
- We never end up setting Poison on the page with error or head page
- Huge page sitting on free list with error in subpage and not marked
- huge page with error could be given to an application or returned to buddy

Prior to this change, Poison would be set on the head page

I do not think this was an intended change in behavior.  But, perhaps it is
all we can do in this case?  Sorry for not being able to look more closely
at the code right now.   
-- 
Mike Kravetz

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ