linux-kernel - Re: [PATCH v2 8/9] mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <db45e4bc-f09a-771a-f90d-448d46bacce1@huawei.com>
Date:   Mon, 27 Jun 2022 20:24:08 +0800
From:   Miaohe Lin <linmiaohe@...wei.com>
To:     Naoya Horiguchi <nao.horiguchi@...il.com>, <linux-mm@...ck.org>
CC:     Andrew Morton <akpm@...ux-foundation.org>,
        David Hildenbrand <david@...hat.com>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Liu Shixin <liushixin2@...wei.com>,
        Yang Shi <shy828301@...il.com>,
        Oscar Salvador <osalvador@...e.de>,
        Muchun Song <songmuchun@...edance.com>,
        Naoya Horiguchi <naoya.horiguchi@....com>,
        <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 8/9] mm, hwpoison: skip raw hwpoison page in freeing
 1GB hugepage

On 2022/6/24 7:51, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@....com>
> 
> Currently if memory_failure() (modified to remove blocking code with
> subsequent patch) is called on a page in some 1GB hugepage, memory error
> handling fails and the raw error page gets into leaked state.  The impact
> is small in production systems (just leaked single 4kB page), but this
> limits the testability because unpoison doesn't work for it.
> We can no longer create 1GB hugepage on the 1GB physical address range
> with such leaked pages, that's not useful when testing on small systems.
> 
> When a hwpoison page in a 1GB hugepage is handled, it's caught by the
> PageHWPoison check in free_pages_prepare() because the 1GB hugepage is
> broken down into raw error pages before coming to this point:
> 
>         if (unlikely(PageHWPoison(page)) && !order) {
>                 ...
>                 return false;
>         }
> 
> Then, the page is not sent to buddy and the page refcount is left 0.
> 
> Originally this check is supposed to work when the error page is freed from
> page_handle_poison() (that is called from soft-offline), but now we are
> opening another path to call it, so the callers of __page_handle_poison()
> need to handle the case by considering the return value 0 as success. Then
> page refcount for hwpoison is properly incremented so unpoison works.
> 
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@....com>

It seems I misunderstand the commit log in [1]. But I hope I get the point this time. :)

Reviewed-by: Miaohe Lin <linmiaohe@...wei.com>

Thanks!

[1]https://lore.kernel.org/linux-mm/19981830-a5e6-bdba-4a1c-1cdcea61b93b@huawei.com/

> ---
>  mm/memory-failure.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index db85f644a1e3..fc7b83cb6468 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1046,7 +1046,6 @@ static int me_huge_page(struct page_state *ps, struct page *p)
>  		res = truncate_error_page(hpage, page_to_pfn(p), mapping);
>  		unlock_page(hpage);
>  	} else {
> -		res = MF_FAILED;
>  		unlock_page(hpage);
>  		/*
>  		 * migration entry prevents later access on error hugepage,
> @@ -1054,9 +1053,11 @@ static int me_huge_page(struct page_state *ps, struct page *p)
>  		 * subpages.
>  		 */
>  		put_page(hpage);
> -		if (__page_handle_poison(p) > 0) {
> +		if (__page_handle_poison(p) >= 0) {
>  			page_ref_inc(p);
>  			res = MF_RECOVERED;
> +		} else {
> +			res = MF_FAILED;
>  		}
>  	}
>  
> @@ -1704,9 +1705,11 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
>  	 */
>  	if (res == 0) {
>  		unlock_page(head);
> -		if (__page_handle_poison(p) > 0) {
> +		if (__page_handle_poison(p) >= 0) {
>  			page_ref_inc(p);
>  			res = MF_RECOVERED;
> +		} else {
> +			res = MF_FAILED;
>  		}
>  		action_result(pfn, MF_MSG_FREE_HUGE, res);
>  		return res == MF_RECOVERED ? 0 : -EBUSY;
>