[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210816175646.GA1600630@agluck-desk2.amr.corp.intel.com>
Date: Mon, 16 Aug 2021 10:56:46 -0700
From: "Luck, Tony" <tony.luck@...el.com>
To: Naoya Horiguchi <naoya.horiguchi@...ux.dev>
Cc: HORIGUCHI NAOYA(堀口 直也)
<naoya.horiguchi@....com>,
Naoya Horiguchi <nao.horiguchi@...il.com>,
Oscar Salvador <osalvador@...e.de>,
Muchun Song <songmuchun@...edance.com>,
Mike Kravetz <mike.kravetz@...cle.com>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Michal Hocko <mhocko@...e.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v6 1/2] mm,hwpoison: fix race with hugetlb page allocation
On Tue, Aug 17, 2021 at 02:12:07AM +0900, Naoya Horiguchi wrote:
> This dump indicates that HWPoisonHandlable() returned false due to
> the lack of PG_lru flag. In older code before 5.13, get_any_page() does
> retry with shake_page(), but does not since 5.13, which seems to me
> the root cause of the issue. So my suggestion is to call shake_page()
> when HWPoisonHandlable() is false.
>
> Could you try checking that the following diff fixes the issue?
> I could still have better fix (like inserting shake_page() to other
> retry paths in get_any_page()), but the below is the minimum one.
Tried it ... and it works! Injected and recovered from a thousand
errors without seeing any problems.
-Tony
P.S. Somewhere in the mail system your patch arrived with <TAB>s
changed to spaces. Here's what I applied to v5.14-rc6 (hopefully with
TABS preserved) ... just in case anyone else is following along with
this thread and wants to try some tests.
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index eefd823deb67..aa6592540f17 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1146,7 +1146,7 @@ static int __get_hwpoison_page(struct page *page)
* unexpected races caused by taking a page refcount.
*/
if (!HWPoisonHandlable(head))
- return 0;
+ return -EBUSY;
if (PageTransHuge(head)) {
/*
@@ -1199,9 +1199,14 @@ static int get_any_page(struct page *p, unsigned long flags)
}
goto out;
} else if (ret == -EBUSY) {
- /* We raced with freeing huge page to buddy, retry. */
- if (pass++ < 3)
+ /*
+ * We raced with (possibly temporary) unhandlable
+ * page, retry.
+ */
+ if (pass++ < 3) {
+ shake_page(p, 1);
goto try_again;
+ }
goto out;
}
}
Powered by blists - more mailing lists