lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 12 Apr 2010 09:26:57 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Borislav Petkov <bp@...en8.de>
cc:	Johannes Weiner <hannes@...xchg.org>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Rik van Riel <riel@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Minchan Kim <minchan.kim@...il.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Lee Schermerhorn <Lee.Schermerhorn@...com>,
	Nick Piggin <npiggin@...e.de>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Hugh Dickins <hugh.dickins@...cali.co.uk>,
	sgunderson@...foot.com
Subject: Re: [PATCH -v2] rmap: make anon_vma_prepare link in all the anon_vmas
 of a mergeable VMA



On Mon, 12 Apr 2010, Linus Torvalds wrote:
> 
> Yes, and that is supposed to be a no-no. The page is clearly associated 
> with the vma in question (since we are unmapping it through that vma), but 
> the vma list of 'anon_vma's doesn't actually have the one that 
> 'page->mapping' points to.
> 
> And that, in turn, means that we've lost sight of the 'page->mapping' 
> anon_vma, and THAT in turn means that it could well have been free'd as 
> being no longer referenced.
> 
> And if it was free'd, it could be re-allocated as something else (after 
> the RCU grace period), and that directly explains your oops.

I have a new theory. And this new theory is completely different from all 
the other things we've been looking at.

The new theory is really simple: 'page->mapping' has been re-set to the 
wrong mapping. 

Now, there is one case where we reset page->mapping _intentionally_, 
namely in the COW-breaking case of having the last user 
("page_move_anon_rmap"). And that looks fine, and happens under normal 
loads all the time. We _want_ to do it there.

But there is a _much_ more subtle case that involved swapping.

So guys, here's my fairly simple theory on what happens:

 - page gets allocated/mapped by process A. Let's call the anon_vma we 
   associate the page with 'A' to keep it easy to track.

 - Process A forks, creating process B. The anon_vma in B is 'B', and has 
   a chain that looks like 'B' -> 'A'. Everything is fine.

 - Swapping happens. The page (with mapping pointing to 'A') gets swapped 
   out (perhaps not to disk - it's enough to assume that it's just not 
   mapped any more, and lives entirely in the swap-cache)

 - Process B pages it in, which goes like this:

	do_swap_page ->
	  page = lookup_swap_cache(entry);
	 ...
	  set_pte_at(mm, address, page_table, pte);
	  page_add_anon_rmap(page, vma, address);

   And think about what happens here!

In particular, what happens is that this will now be the "first" mapping 
of that page, so page_add_anon_rmap() will do

	if (first)
		__page_set_anon_rmap(page, vma, address);

and notice what anon_vma it will use? It will use the anon_vma for process 
B!

So now page->mapping actually points to anon_vma 'B', not 'A' like it used 
to.

What happens then? Trivial: process 'A' also pages it in (nothing happens, 
it's not the first mapping), and then process 'B' execve's or exits or 
unmaps, making anon_vma B go away.

End result: process A has a page that points to anon_vma B, but anon_vma B 
does not exist any more. This can go on forever. Forget about RCU grace 
periods, forget about locking, forget anything like that. The bug is 
simply that page->mapping points to an anon_vma that was correct at one 
point, but was _not_ the one that was shared by all users of that possible 
mapping.

The patch below is my largely mindless try at fixing this. It's untested. 
I'm not entirely sure that it actually works. But it makes some amount of 
conceptual sense. No?

			Linus

---
 mm/rmap.c |   15 +++++++++++++--
 1 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index ee97d38..4bad326 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -734,9 +734,20 @@ void page_move_anon_rmap(struct page *page,
 static void __page_set_anon_rmap(struct page *page,
 	struct vm_area_struct *vma, unsigned long address)
 {
-	struct anon_vma *anon_vma = vma->anon_vma;
+	struct anon_vma_chain *avc;
+	struct anon_vma *anon_vma;
+
+	BUG_ON(!vma->anon_vma);
+
+	/*
+	 * We must use the _oldest_ possible anon_vma for the page mapping!
+	 *
+	 * So take the last AVC chain entry in the vma, which is the deepest
+	 * ancestor, and use the anon_vma from that.
+	 */
+	avc = list_entry(vma->anon_vma_chain.prev, struct anon_vma_chain, same_vma);
+	anon_vma = avc->anon_vma;
 
-	BUG_ON(!anon_vma);
 	anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
 	page->mapping = (struct address_space *) anon_vma;
 	page->index = linear_page_index(vma, address);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ