[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <db45e4bc-f09a-771a-f90d-448d46bacce1@huawei.com>
Date: Mon, 27 Jun 2022 20:24:08 +0800
From: Miaohe Lin <linmiaohe@...wei.com>
To: Naoya Horiguchi <nao.horiguchi@...il.com>, <linux-mm@...ck.org>
CC: Andrew Morton <akpm@...ux-foundation.org>,
David Hildenbrand <david@...hat.com>,
Mike Kravetz <mike.kravetz@...cle.com>,
Liu Shixin <liushixin2@...wei.com>,
Yang Shi <shy828301@...il.com>,
Oscar Salvador <osalvador@...e.de>,
Muchun Song <songmuchun@...edance.com>,
Naoya Horiguchi <naoya.horiguchi@....com>,
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 8/9] mm, hwpoison: skip raw hwpoison page in freeing
1GB hugepage
On 2022/6/24 7:51, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@....com>
>
> Currently if memory_failure() (modified to remove blocking code with
> subsequent patch) is called on a page in some 1GB hugepage, memory error
> handling fails and the raw error page gets into leaked state. The impact
> is small in production systems (just leaked single 4kB page), but this
> limits the testability because unpoison doesn't work for it.
> We can no longer create 1GB hugepage on the 1GB physical address range
> with such leaked pages, that's not useful when testing on small systems.
>
> When a hwpoison page in a 1GB hugepage is handled, it's caught by the
> PageHWPoison check in free_pages_prepare() because the 1GB hugepage is
> broken down into raw error pages before coming to this point:
>
> if (unlikely(PageHWPoison(page)) && !order) {
> ...
> return false;
> }
>
> Then, the page is not sent to buddy and the page refcount is left 0.
>
> Originally this check is supposed to work when the error page is freed from
> page_handle_poison() (that is called from soft-offline), but now we are
> opening another path to call it, so the callers of __page_handle_poison()
> need to handle the case by considering the return value 0 as success. Then
> page refcount for hwpoison is properly incremented so unpoison works.
>
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@....com>
It seems I misunderstand the commit log in [1]. But I hope I get the point this time. :)
Reviewed-by: Miaohe Lin <linmiaohe@...wei.com>
Thanks!
[1]https://lore.kernel.org/linux-mm/19981830-a5e6-bdba-4a1c-1cdcea61b93b@huawei.com/
> ---
> mm/memory-failure.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index db85f644a1e3..fc7b83cb6468 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1046,7 +1046,6 @@ static int me_huge_page(struct page_state *ps, struct page *p)
> res = truncate_error_page(hpage, page_to_pfn(p), mapping);
> unlock_page(hpage);
> } else {
> - res = MF_FAILED;
> unlock_page(hpage);
> /*
> * migration entry prevents later access on error hugepage,
> @@ -1054,9 +1053,11 @@ static int me_huge_page(struct page_state *ps, struct page *p)
> * subpages.
> */
> put_page(hpage);
> - if (__page_handle_poison(p) > 0) {
> + if (__page_handle_poison(p) >= 0) {
> page_ref_inc(p);
> res = MF_RECOVERED;
> + } else {
> + res = MF_FAILED;
> }
> }
>
> @@ -1704,9 +1705,11 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
> */
> if (res == 0) {
> unlock_page(head);
> - if (__page_handle_poison(p) > 0) {
> + if (__page_handle_poison(p) >= 0) {
> page_ref_inc(p);
> res = MF_RECOVERED;
> + } else {
> + res = MF_FAILED;
> }
> action_result(pfn, MF_MSG_FREE_HUGE, res);
> return res == MF_RECOVERED ? 0 : -EBUSY;
>
Powered by blists - more mailing lists