lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 11 May 2015 10:51:17 +0300
From:	Vladimir Davydov <vdavydov@...allels.com>
To:	Rik van Riel <riel@...hat.com>, Hugh Dickins <hughd@...gle.com>,
	Christoph Lameter <cl@...ux.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>
CC:	Minchan Kim <minchan@...nel.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Michal Hocko <mhocko@...e.cz>,
	Greg Thelen <gthelen@...gle.com>,
	Michel Lespinasse <walken@...gle.com>,
	David Rientjes <rientjes@...gle.com>,
	Pavel Emelyanov <xemul@...allels.com>,
	Cyrill Gorcunov <gorcunov@...nvz.org>, <linux-mm@...ck.org>,
	<linux-kernel@...r.kernel.org>
Subject: [RFC] rmap: fix "race" between do_wp_page and shrink_active_list

Hi,

I've been arguing with Minchan for a while about whether store-tearing
is possible while setting page->mapping in __page_set_anon_rmap and
friends, see

  http://thread.gmane.org/gmane.linux.kernel.mm/131949/focus=132132

This patch is intended to draw attention to this discussion. It fixes a
race that could happen if store-tearing were possible. The race is as
follows.

In do_wp_page() we can call page_move_anon_rmap(), which sets
page->mapping as follows:

        anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
        page->mapping = (struct address_space *) anon_vma;

The page in question may be on an LRU list, because nowhere in
do_wp_page() we remove it from the list, neither do we take any LRU
related locks. Although the page is locked, shrink_active_list() can
still call page_referenced() on it concurrently, because the latter does
not require an anonymous page to be locked.

If store tearing described in the thread were possible, we could face
the following race resulting in kernel panic:

  CPU0                          CPU1
  ----                          ----
  do_wp_page                    shrink_active_list
   lock_page                     page_referenced
                                  PageAnon->yes, so skip trylock_page
   page_move_anon_rmap
    page->mapping = anon_vma
                                  rmap_walk
                                   PageAnon->no
                                   rmap_walk_file
                                    BUG
    page->mapping += PAGE_MAPPING_ANON

This patch fixes this race by explicitly forbidding the compiler to
split page->mapping store in __page_set_anon_rmap() and friends and load
in PageAnon() with the aid of WRITE/READ_ONCE.

Personally, I don't believe that this can ever happen on any sane
compiler, because such an "optimization" would only result in two stores
vs one (note, anon_vma is not a constant), but since I can be mistaken I
would like to hear from synchronization experts what they think about
it.

Thanks,
Vladimir
---
 include/linux/page-flags.h |    3 ++-
 mm/rmap.c                  |    6 +++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 5e7c4f50a644..a529e0a35fe9 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -320,7 +320,8 @@ PAGEFLAG(Idle, idle)
 
 static inline int PageAnon(struct page *page)
 {
-	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
+	return ((unsigned long)READ_ONCE(page->mapping) &
+		PAGE_MAPPING_ANON) != 0;
 }
 
 #ifdef CONFIG_KSM
diff --git a/mm/rmap.c b/mm/rmap.c
index eca7416f55d7..aa60c63704e6 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -958,7 +958,7 @@ void page_move_anon_rmap(struct page *page,
 	VM_BUG_ON_PAGE(page->index != linear_page_index(vma, address), page);
 
 	anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
-	page->mapping = (struct address_space *) anon_vma;
+	WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);
 }
 
 /**
@@ -987,7 +987,7 @@ static void __page_set_anon_rmap(struct page *page,
 		anon_vma = anon_vma->root;
 
 	anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
-	page->mapping = (struct address_space *) anon_vma;
+	WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);
 	page->index = linear_page_index(vma, address);
 }
 
@@ -1579,7 +1579,7 @@ static void __hugepage_set_anon_rmap(struct page *page,
 		anon_vma = anon_vma->root;
 
 	anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
-	page->mapping = (struct address_space *) anon_vma;
+	WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);
 	page->index = linear_page_index(vma, address);
 }
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ