linux-kernel - Re: [PATCH -v2] rmap: make anon_vma_prepare link in all the anon

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 7 Apr 2010 16:37:15 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Rik van Riel <riel@...hat.com>
cc:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Borislav Petkov <bp@...en8.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Minchan Kim <minchan.kim@...il.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Lee Schermerhorn <Lee.Schermerhorn@...com>,
	Nick Piggin <npiggin@...e.de>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Hugh Dickins <hugh.dickins@...cali.co.uk>,
	sgunderson@...foot.com, hannes@...xchg.org
Subject: Re: [PATCH -v2] rmap: make anon_vma_prepare link in all the anon_vmas
 of a mergeable VMA



On Wed, 7 Apr 2010, Linus Torvalds wrote:
> 
> Yes. The failure point is too late to do anything really interesting with, 
> and the old code also just causes a SIGBUS. My intention was to change the 
> 
> 	WARN_ONCE(!vma->anon_vma);
> 
> into returning that SIGBUS - which is not wonderful, but is no different 
> from old failures.

Not SIGBUS, but VM_FAULT_OOM, of course.

IOW, something like this should be no worse than what we have now, and has 
the much nicer locking semantics.

Having done some more digging, I can point to a downside: we do end up 
having about twice as many anon_vma entries. It seems about half of the 
vma's never need an anon_vma entry, probably because they end up being 
read-only file mappings, and thus never trigger the anonvma case.

That said:

 - I don't really think you can fix the locking problem you have in a 
   saner way

 - the anon_vma entry is much smaller than the vm_area_struct, so we're 
   still using much less memory for them than for vma's.

 - We _could_ avoid allocating anonvma entries for shared mappings or for 
   mappings that are read-only. That might force us to allocate some of 
   them at mprotect time, and/or when doing a forced COW event with 
   ptrace, but we have the mmap_sem for writing for the one case, and we 
   could decide to get it for the other.

   So it's not a _fundamental_ problem if we decide we want to recover 
   most of the memory lost by doing unconditional allocations.

There are alternative models. For example, the VM layer _could_ decide to 
just release the mmap_sem, and re-do it and take it for writing if the vma 
doesn't have an anon_vma. 

I dunno. I like how this patch makes things so much less subtle, though.  
For example: with this in place, we could further simplify 
anon_vma_prepare(), since it would now never have the re-entrancy issue 
and wouldn't need to worry about taking that page_table_lock and 
re-testing vma->anon_vma for races.

		Linus

---
 mm/memory.c |   12 +++---------
 mm/mmap.c   |   17 ++++-------------
 2 files changed, 7 insertions(+), 22 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 833952d..b5efe76 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2223,9 +2223,6 @@ reuse:
 gotten:
 	pte_unmap_unlock(page_table, ptl);
 
-	if (unlikely(anon_vma_prepare(vma)))
-		goto oom;
-
 	if (is_zero_pfn(pte_pfn(orig_pte))) {
 		new_page = alloc_zeroed_user_highpage_movable(vma, address);
 		if (!new_page)
@@ -2766,8 +2763,6 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	/* Allocate our own private page. */
 	pte_unmap(page_table);
 
-	if (unlikely(anon_vma_prepare(vma)))
-		goto oom;
 	page = alloc_zeroed_user_highpage_movable(vma, address);
 	if (!page)
 		goto oom;
@@ -2863,10 +2858,6 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	if (flags & FAULT_FLAG_WRITE) {
 		if (!(vma->vm_flags & VM_SHARED)) {
 			anon = 1;
-			if (unlikely(anon_vma_prepare(vma))) {
-				ret = VM_FAULT_OOM;
-				goto out;
-			}
 			page = alloc_page_vma(GFP_HIGHUSER_MOVABLE,
 						vma, address);
 			if (!page) {
@@ -3115,6 +3106,9 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	pmd_t *pmd;
 	pte_t *pte;
 
+	if (!vma->anon_vma)
+		return VM_FAULT_OOM;
+
 	__set_current_state(TASK_RUNNING);
 
 	count_vm_event(PGFAULT);
diff --git a/mm/mmap.c b/mm/mmap.c
index 75557c6..c14284b 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -463,6 +463,8 @@ static void vma_link(struct mm_struct *mm, struct vm_area_struct *vma,
 
 	mm->map_count++;
 	validate_mm(mm);
+
+	anon_vma_prepare(vma);
 }
 
 /*
@@ -479,6 +481,8 @@ static void __insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vma)
 	BUG_ON(__vma && __vma->vm_start < vma->vm_end);
 	__vma_link(mm, vma, prev, rb_link, rb_parent);
 	mm->map_count++;
+
+	anon_vma_prepare(vma);
 }
 
 static inline void
@@ -1674,12 +1678,6 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
 	if (!(vma->vm_flags & VM_GROWSUP))
 		return -EFAULT;
 
-	/*
-	 * We must make sure the anon_vma is allocated
-	 * so that the anon_vma locking is not a noop.
-	 */
-	if (unlikely(anon_vma_prepare(vma)))
-		return -ENOMEM;
 	anon_vma_lock(vma);
 
 	/*
@@ -1720,13 +1718,6 @@ static int expand_downwards(struct vm_area_struct *vma,
 {
 	int error;
 
-	/*
-	 * We must make sure the anon_vma is allocated
-	 * so that the anon_vma locking is not a noop.
-	 */
-	if (unlikely(anon_vma_prepare(vma)))
-		return -ENOMEM;
-
 	address &= PAGE_MASK;
 	error = security_file_mmap(NULL, 0, 0, 0, address, 1);
 	if (error)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/