lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.00.1001071031440.7821@localhost.localdomain>
Date:	Thu, 7 Jan 2010 10:44:13 -0800 (PST)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Peter Zijlstra <peterz@...radead.org>
cc:	Christoph Lameter <cl@...ux-foundation.org>,
	Arjan van de Ven <arjan@...radead.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"minchan.kim@...il.com" <minchan.kim@...il.com>,
	"hugh.dickins" <hugh.dickins@...cali.co.uk>,
	Nick Piggin <nickpiggin@...oo.com.au>,
	Ingo Molnar <mingo@...e.hu>
Subject: Re: [RFC][PATCH 6/8] mm: handle_speculative_fault()



On Thu, 7 Jan 2010, Linus Torvalds wrote:
> 
> For example: there's no real reason why we take mmap_sem for writing when 
> extending an existing vma. And while 'brk()' is a very oldfashioned way of 
> doing memory management, it's still quite common. So rather than looking 
> at subtle lockless algorithms, why not look at doing the common cases of 
> an extending brk? Make that one take the mmap_sem for _reading_, and then 
> do the extending of the brk area with a simple cmpxchg or something?

I didn't use cmpxchg, because we actually want to update both 
'current->brk' _and_ the vma->vm_end atomically, so here's a totally 
untested patch that uses the page_table_lock spinlock for it instead (it 
could be a new spinlock, not worth it).

It's also totally untested and might be horribly broken. But you get the 
idea.

We could probably do things like this in regular mmap() too for the 
"extend a mmap" case. brk() is just especially simple.

		Linus

---
 mm/mmap.c |   89 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 79 insertions(+), 10 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index ee22989..3d07e5f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -242,23 +242,14 @@ static struct vm_area_struct *remove_vma(struct vm_area_struct *vma)
 	return next;
 }
 
-SYSCALL_DEFINE1(brk, unsigned long, brk)
+static long slow_brk(unsigned long brk)
 {
 	unsigned long rlim, retval;
 	unsigned long newbrk, oldbrk;
 	struct mm_struct *mm = current->mm;
-	unsigned long min_brk;
 
 	down_write(&mm->mmap_sem);
 
-#ifdef CONFIG_COMPAT_BRK
-	min_brk = mm->end_code;
-#else
-	min_brk = mm->start_brk;
-#endif
-	if (brk < min_brk)
-		goto out;
-
 	/*
 	 * Check against rlimit here. If this check is done later after the test
 	 * of oldbrk with newbrk then it can escape the test and let the data
@@ -297,6 +288,84 @@ out:
 	return retval;
 }
 
+/*
+ * NOTE NOTE NOTE!
+ *
+ * We do all this speculatively with just the read lock held.
+ * If anything goes wrong, we release the read lock, and punt
+ * to the traditional write-locked version.
+ *
+ * Here "goes wrong" includes:
+ *  - having to unmap the area and actually shrink it
+ *  - the final cmpxchg doesn't succeed
+ *  - the old brk area wasn't a simple anonymous vma
+ *  etc etc
+ */
+static struct vm_area_struct *ok_to_extend_brk(struct mm_struct *mm, unsigned long cur_brk, unsigned long brk)
+{
+	struct vm_area_struct *vma, *nextvma;
+
+	nextvma = find_vma_prev(mm, cur_brk, &vma);
+	if (vma) {
+		/*
+		 * Needs to be an anonymous vma that ends at the current brk,
+		 * and with no following vma that would start before the
+		 * new brk
+		 */
+		if (vma->vm_file || cur_brk != vma->vm_end || (nextvma && brk > nextvma->vm_start))
+			vma = NULL;
+	}
+	return vma;
+}
+
+SYSCALL_DEFINE1(brk, unsigned long, brk)
+{
+	unsigned long cur_brk, min_brk;
+	struct mm_struct *mm = current->mm;
+	struct vm_area_struct *vma;
+
+	down_read(&mm->mmap_sem);
+	cur_brk = mm->brk;
+#ifdef CONFIG_COMPAT_BRK
+	min_brk = mm->end_code;
+#else
+	min_brk = mm->start_brk;
+#endif
+	if (brk < min_brk)
+		goto out;
+
+	brk = PAGE_ALIGN(brk);
+	cur_brk = PAGE_ALIGN(cur_brk);
+
+	if (brk < cur_brk)
+		goto slow_case;
+	if (brk == cur_brk)
+		goto out;
+
+	vma = ok_to_extend_brk(mm, cur_brk, brk);
+	if (!vma)
+		goto slow_case;
+
+	spin_lock(&mm->page_table_lock);
+	if (vma->vm_end == cur_brk) {
+		vma->vm_end = brk;
+		mm->brk = brk;
+		cur_brk = brk;
+	}
+	spin_unlock(&mm->page_table_lock);
+
+	if (cur_brk != brk)
+		goto slow_case;
+
+out:
+	up_read(&mm->mmap_sem);
+	return cur_brk;
+
+slow_case:
+	up_read(&mm->mmap_sem);
+	return slow_brk(brk);
+}
+
 #ifdef DEBUG_MM_RB
 static int browse_rb(struct rb_root *root)
 {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ