[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YV4Dz3y4NXhtqd6V@t490s>
Date: Wed, 6 Oct 2021 16:15:11 -0400
From: Peter Xu <peterx@...hat.com>
To: Yang Shi <shy828301@...il.com>
Cc: naoya.horiguchi@....com, hughd@...gle.com,
kirill.shutemov@...ux.intel.com, willy@...radead.org,
osalvador@...e.de, akpm@...ux-foundation.org, linux-mm@...ck.org,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [v3 PATCH 2/5] mm: filemap: check if THP has hwpoisoned subpage
for PMD page fault
On Thu, Sep 30, 2021 at 02:53:08PM -0700, Yang Shi wrote:
> @@ -1148,8 +1148,12 @@ static int __get_hwpoison_page(struct page *page)
> return -EBUSY;
>
> if (get_page_unless_zero(head)) {
> - if (head == compound_head(page))
> + if (head == compound_head(page)) {
> + if (PageTransHuge(head))
> + SetPageHasHWPoisoned(head);
> +
> return 1;
> + }
>
> pr_info("Memory failure: %#lx cannot catch tail\n",
> page_to_pfn(page));
Sorry for the late comments.
I'm wondering whether it's ideal to set this bit here, as get_hwpoison_page()
sounds like a pure helper to get a refcount out of a sane hwpoisoned page. I'm
afraid there can be side effect that we set this without being noticed, so I'm
also wondering we should keep it in memory_failure().
Quotting comments for get_hwpoison_page():
* get_hwpoison_page() takes a page refcount of an error page to handle memory
* error on it, after checking that the error page is in a well-defined state
* (defined as a page-type we can successfully handle the memor error on it,
* such as LRU page and hugetlb page).
For example, I see that both unpoison_memory() and soft_offline_page() will
call it too, does it mean that we'll also set the bits e.g. even when we want
to inject an unpoison event too?
Thanks,
--
Peter Xu
Powered by blists - more mailing lists