[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1377479931-7430-3-git-send-email-liwanp@linux.vnet.ibm.com>
Date: Mon, 26 Aug 2013 09:18:46 +0800
From: Wanpeng Li <liwanp@...ux.vnet.ibm.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Andi Kleen <andi@...stfloor.org>,
Fengguang Wu <fengguang.wu@...el.com>,
Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
Tony Luck <tony.luck@...el.com>, gong.chen@...ux.intel.com,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Wanpeng Li <liwanp@...ux.vnet.ibm.com>
Subject: [PATCH v3 3/8] mm/hwpoison: fix race against poison thp
v1 -> v2:
* unpoison thp fail
There is a race between hwpoison page and unpoison page, memory_failure
set the page hwpoison and increase num_poisoned_pages without hold page
lock, and one page count will be accounted against thp for num_poisoned_pages.
However, unpoison can occur before memory_failure hold page lock and
split transparent hugepage, unpoison will decrease num_poisoned_pages
by 1 << compound_order since memory_failure has not yet split transparent
hugepage with page lock held. That means we account one page for hwpoison
and 1 << compound_order for unpoison. This patch fix it by inserting a
PageTransHuge check before doing TestClearPageHWPoison, unpoison failed
without clearing PageHWPoison and decreasing num_poisoned_pages.
A B
memory_failue
TestSetPageHWPoison(p);
if (PageHuge(p))
nr_pages = 1 << compound_order(hpage);
else
nr_pages = 1;
atomic_long_add(nr_pages, &num_poisoned_pages);
unpoison_memory
nr_pages = 1<< compound_trans_order(page);
if(TestClearPageHWPoison(p))
atomic_long_sub(nr_pages, &num_poisoned_pages);
lock page
if (!PageHWPoison(p))
unlock page and return
hwpoison_user_mappings
if (PageTransHuge(hpage))
split_huge_page(hpage);
Suggested-by: Naoya Horiguchi <n-horiguchi@...jp.nec.com>
Signed-off-by: Wanpeng Li <liwanp@...ux.vnet.ibm.com>
---
mm/memory-failure.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 5a4f4d6..a6c4752 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1339,6 +1339,16 @@ int unpoison_memory(unsigned long pfn)
return 0;
}
+ /*
+ * unpoison_memory() can encounter thp only when the thp is being
+ * worked by memory_failure() and the page lock is not held yet.
+ * In such case, we yield to memory_failure() and make unpoison fail.
+ */
+ if (PageTransHuge(page)) {
+ pr_info("MCE: Memory failure is now running on %#lx\n", pfn);
+ return 0;
+ }
+
nr_pages = 1 << compound_order(page);
if (!get_page_unless_zero(page)) {
--
1.8.1.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists