lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.1.10.0806230928140.2926@woody.linux-foundation.org>
Date:	Mon, 23 Jun 2008 09:39:49 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Hugh Dickins <hugh@...itas.com>
cc:	Jeff Chua <jeff.chua.linux@...il.com>, Greg KH <gregkh@...e.de>,
	linux-kernel@...r.kernel.org, stable@...nel.org,
	Justin Forbes <jmforbes@...uxtx.org>,
	Zwane Mwaikambo <zwane@....linux.org.uk>,
	"Theodore Ts'o" <tytso@....edu>,
	Randy Dunlap <rdunlap@...otime.net>,
	Dave Jones <davej@...hat.com>,
	Chuck Wolber <chuckw@...ntumlinux.com>,
	Chris Wedgwood <reviews@...cw.f00f.org>,
	Michael Krufky <mkrufky@...uxtv.org>,
	Chuck Ebbert <cebbert@...hat.com>,
	Domenico Andreoli <cavokz@...il.com>, Willy Tarreau <w@....eu>,
	Rodrigo Rubira Branco <rbranco@...checkpoint.com>,
	akpm@...ux-foundation.org, alan@...rguk.ukuu.org.uk,
	Oleg Nesterov <oleg@...sign.ru>, Nick Piggin <npiggin@...e.de>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Ingo Molnar <mingo@...e.hu>, Roland McGrath <roland@...hat.com>
Subject: Re: [patch 2/5] Reinstate ZERO_PAGE optimization in get_user_pages()
 and fix XIP



On Mon, 23 Jun 2008, Hugh Dickins wrote:

> On Mon, 23 Jun 2008, Jeff Chua wrote:
> > 
> > I can confirm that the 2nd patch from Linus fixed the problem.
> > 
> >                http://lkml.org/lkml/2008/6/22/107
> 
> But I'm afraid you've pushed me into taking another look at that
> patch, and I see a problem with it.  To be honest, I've lost the
> plot on this issue, and didn't really get what your problem is,
> nor how Linus expected to be fixing it.

The problem is that the old code said:

 - we can use FOLL_ANON, assuming that the vma has no vm_ops, or has no 
   "fault" callback.

That was funcamentally broken. Because you can have a "nopfn" callback. 
But it's hard to notice, since the whole FOLL_ANON code only _used_ to 
trigger if a whole page table was missing.

The VM_LOCKED test was just crazy, but I doubt it was the cause of the 
bug.

> The problem is that "insane" VM_LOCKED test which he has removed.
> I've remembered now what that's about: it's for make_pages_present.

That's still crazy. make_pages_present() already does:

	write = (vma->vm_flags & VM_WRITE) != 0;

and passes that in to "get_user_pages()". So for a writable mapping, we'll 
elide the FOLL_ANON case anyway, and for a read-only mapping we should 
have used ZERO_PAGE. Damn. Oh, well.

We can certainly re-instate the insane behaviour for mlock(). Not that we 
historically used to - we used to just map in ZERO_PAGE.

> So I think Linus needs to factor that into the final patch,
> whilst at the same time solving whatever is the vmware breakage.

So here's a third patch to test. It removes the VM_SHARED thing just to 
get us closer to the original code (and because do_no_page() didn't do it 
historically, so let's not do it either), and it re-instates the insane 
VM_LOCKED test with a comment.

Jeff, does this still work with vmware?

		Linus

---
 mm/memory.c |   20 ++++++++++++++++++--
 1 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 9aefaae..a2ce28d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1045,6 +1045,23 @@ no_page_table:
 	return page;
 }
 
+/* Can we do the FOLL_ANON optimization? */
+static inline int use_zero_page(struct vm_area_struct *vma)
+{
+	/*
+	 * We don't want to optimize FOLL_ANON for make_pages_present()
+	 * when it tries to page in a VM_LOCKED region.
+	 */
+	if (vma->vm_flags & VM_LOCKED)
+		return 0;
+	/*
+	 * And if we have a fault or a nopfn routine, it's not an
+	 * anonymous region.
+	 */
+	return !vma->vm_ops ||
+		(!vma->vm_ops->fault && !vma->vm_ops->nopfn);
+}
+
 int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 		unsigned long start, int len, int write, int force,
 		struct page **pages, struct vm_area_struct **vmas)
@@ -1119,8 +1136,7 @@ int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 		foll_flags = FOLL_TOUCH;
 		if (pages)
 			foll_flags |= FOLL_GET;
-		if (!write && !(vma->vm_flags & VM_LOCKED) &&
-		    (!vma->vm_ops || !vma->vm_ops->fault))
+		if (!write && use_zero_page(vma))
 			foll_flags |= FOLL_ANON;
 
 		do {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ