lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 26 Sep 2016 11:22:29 -0400
From:   zi.yan@...t.com
To:     linux-kernel@...r.kernel.org, linux-mm@...ck.org
Cc:     benh@...nel.crashing.org, mgorman@...hsingularity.net,
        kirill.shutemov@...ux.intel.com, akpm@...ux-foundation.org,
        dave.hansen@...ux.intel.com, n-horiguchi@...jp.nec.com
Subject: [PATCH v1 07/12] mm: hwpoison: fix race between unpoisoning and freeing migrate source page

From: Naoya Horiguchi <n-horiguchi@...jp.nec.com>

During testing thp migration, I saw the BUG_ON triggered due to the race between
soft offline and unpoison (what I actually saw was "bad page" warning of freeing
page with PageActive set, then subsequent bug messages differ each time.)

I tried to solve similar problem a few times (see commit f4c18e6f7b5b ("mm:
check __PG_HWPOISON separately from PAGE_FLAGS_CHECK_AT_*",) but the new
workload brings out a new problem of the previous solution.

Let's say that unpoison never works well if the target page is not properly
contained,) so now I'm going in the direction of limiting unpoison function
(as commit 230ac719c500 ("mm/hwpoison: don't try to unpoison containment-failed
pages" does). This patch takes another step in the direction by ensuring that
the target page is kicked out from any pcplist. With this change, the dirty hack
of calling put_page() instead of putback_lru_page() when migration reason is
MR_MEMORY_FAILURE is not necessary any more, so it's reverted.

Signed-off-by: Naoya Horiguchi <n-horiguchi@...jp.nec.com>
---
 mm/memory-failure.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index de88f33..e105f91 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1435,6 +1435,13 @@ int unpoison_memory(unsigned long pfn)
 		return 0;
 	}
 
+	/*
+	 * Soft-offlined pages might stay in PCP list because it's freed via
+	 * putback_lru_page(), and such pages shouldn't be unpoisoned because
+	 * it could cause list corruption. So let's drain pages to avoid that.
+	 */
+	shake_page(page, 0);
+
 	nr_pages = 1 << compound_order(page);
 
 	if (!get_hwpoison_page(p)) {
@@ -1678,7 +1685,8 @@ static int __soft_offline_page(struct page *page, int flags)
 				pfn, ret, page->flags);
 			if (ret > 0)
 				ret = -EIO;
-		}
+		} else if (!TestSetPageHWPoison(page))
+			num_poisoned_pages_inc();
 	} else {
 		pr_info("soft offline: %#lx: isolation failed: %d, page count %d, type %lx\n",
 			pfn, ret, page_count(page), page->flags);
-- 
2.9.3

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ