linux-kernel - [PATCH 1/1] mm: fix the theoretical compound_lock() vs prep_new

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20131219190920.GB24566@redhat.com>
Date:	Thu, 19 Dec 2013 20:09:20 +0100
From:	Oleg Nesterov <oleg@...hat.com>
To:	Andrea Arcangeli <aarcange@...hat.com>
Cc:	Thomas Gleixner <tglx@...utronix.de>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Dave Jones <davej@...hat.com>,
	Darren Hart <dvhart@...ux.intel.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Mel Gorman <mgorman@...e.de>,
	Martin Schwidefsky <schwidefsky@...ibm.com>,
	Heiko Carstens <heiko.carstens@...ibm.com>
Subject: [PATCH 1/1] mm: fix the theoretical compound_lock() vs
	prep_new_page() race

get/put_page(thp_tail) paths do get_page_unless_zero(page_head) +
compound_lock(). In theory this page_head can be already freed and
reallocated as alloc_pages(__GFP_COMP, smaller_order). In this case
get_page_unless_zero() can succeed right after set_page_refcounted(),
and compound_lock() can race with the non-atomic __SetPageHead().

Perhaps we should rework the thp locking (under discussion), but
until then this patch moves set_page_refcounted() and adds wmb()
to ensure that page->_count != 0 comes as a last change.

I am not sure about other callers of set_page_refcounted(), but at
first glance they look fine to me.

Signed-off-by: Oleg Nesterov <oleg@...hat.com>
---
 mm/page_alloc.c |   10 ++++++++--
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 115b23b..9402337 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -865,8 +865,6 @@ static int prep_new_page(struct page *page, int order, gfp_t gfp_flags)
 	}
 
 	set_page_private(page, 0);
-	set_page_refcounted(page);
-
 	arch_alloc_page(page, order);
 	kernel_map_pages(page, 1 << order, 1);
 
@@ -876,6 +874,14 @@ static int prep_new_page(struct page *page, int order, gfp_t gfp_flags)
 	if (order && (gfp_flags & __GFP_COMP))
 		prep_compound_page(page, order);
 
+	/*
+	 * Make sure the caller of get_page_unless_zero() will see the
+	 * fully initialized page. Say, to ensure that compound_lock()
+	 * can't race with the non-atomic __SetPage*() above.
+	 */
+	smp_wmb();
+	set_page_refcounted(page);
+
 	return 0;
 }
 
-- 
1.5.5.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/