lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 31 Jan 2009 10:34:30 -0800 (PST)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Hugh Dickins <hugh@...itas.com>
cc:	Lee Schermerhorn <Lee.Schermerhorn@...com>,
	Greg KH <gregkh@...e.de>,
	Maksim Yevmenkin <maksim.yevmenkin@...il.com>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	Nick Piggin <npiggin@...e.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	will@...wder-design.com, Rik van Riel <riel@...hat.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Mikos Szeredi <miklos@...redi.hu>
Subject: Re: [PATCH] Fix OOPS in mmap_region() when merging adjacent VM_LOCKED
 file segments



On Sat, 31 Jan 2009, Hugh Dickins wrote:
>
> We need a way to communicate not-MAP_NORESERVE to shmem.c, and we don't
> just need it in the explicit shmem_zero_setup() case, we also need it
> for the (probably rare nowadays) case when mmap() is working on file
> /dev/zero (drivers/char/mem.c mmap_zero()), rather than using MAP_ANON.

Heh. We already have that. Maybe you didn't realize. Look at VM_NORESERVE.

So shmem.c can just look at "!(vma->vm_flags & VM_NORESERVE)" if it wants 
to.

The only piece of information you don't have is that "accountable" flag, 
but that was why I was pointing out how totally _useless_ that flag 
actually is. We shouldn't pass it along, because it's always 1, except for 
one special case that we can always calculate (private hugepages).

But in fact, even leaving that untouched, we can trivially just change 
what 'VM_NORESERVE' means: we can just make it the end result of all that 
'accountable' logic, instead of having it just mirror MAP_NORESERVE 
blindly.

> Still haven't decided what's best to do about it (plenty of diversions):
> perhaps we just say my error was to overload VM_ACCOUNT, and define a
> new flag for the purpose, which can go into VM_MERGEABLE_FLAGS; but
> I'd prefer a neater solution if it crosses my mind.

How about this pretty trivial patch?

TOTALLY UNTESTED. As usual. But the concept is pretty simple, and it 
actually removes a fair chunk of hacky code.  The only reason the diffstat 
output says that it adds more lines than it deletes is that I added more 
comments and made that helper inline function rather than make a complex 
conditional.

Whaddaya think?

		Linus

---
 mm/mmap.c  |   48 +++++++++++++++++++++++++-----------------------
 mm/shmem.c |    2 +-
 2 files changed, 26 insertions(+), 24 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index c581df1..5fcaec3 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1090,6 +1090,15 @@ int vma_wants_writenotify(struct vm_area_struct *vma)
 		mapping_cap_account_dirty(vma->vm_file->f_mapping);
 }
 
+/*
+ * We account for memory if it's a private writeable mapping,
+ * and VM_NORESERVE wasn't set.
+ */
+static inline int private_accountable_mapping(unsigned int vm_flags)
+{
+	return (vm_flags & (VM_NORESERVE | VM_SHARED | VM_WRITE)) == VM_WRITE;
+}
+
 unsigned long mmap_region(struct file *file, unsigned long addr,
 			  unsigned long len, unsigned long flags,
 			  unsigned int vm_flags, unsigned long pgoff,
@@ -1117,23 +1126,24 @@ munmap_back:
 	if (!may_expand_vm(mm, len >> PAGE_SHIFT))
 		return -ENOMEM;
 
-	if (flags & MAP_NORESERVE)
+	/*
+	 * Set 'VM_NORESERVE' if we should not account for the
+	 * memory use of this mapping. We only honor MAP_NORESERVE
+	 * if we're allowed to overcommit memory.
+	 */
+	if ((flags & MAP_NORESERVE) && sysctl_overcommit_memory != OVERCOMMIT_NEVER)
+		vm_flags |= VM_NORESERVE;
+	if (!accountable)
 		vm_flags |= VM_NORESERVE;
 
-	if (accountable && (!(flags & MAP_NORESERVE) ||
-			    sysctl_overcommit_memory == OVERCOMMIT_NEVER)) {
-		if (vm_flags & VM_SHARED) {
-			/* Check memory availability in shmem_file_setup? */
-			vm_flags |= VM_ACCOUNT;
-		} else if (vm_flags & VM_WRITE) {
-			/*
-			 * Private writable mapping: check memory availability
-			 */
-			charged = len >> PAGE_SHIFT;
-			if (security_vm_enough_memory(charged))
-				return -ENOMEM;
-			vm_flags |= VM_ACCOUNT;
-		}
+	/*
+	 * Private writable mapping: check memory availability
+	 */
+	if (private_accountable_mapping(vm_flags)) {
+		charged = len >> PAGE_SHIFT;
+		if (security_vm_enough_memory(charged))
+			return -ENOMEM;
+		vm_flags |= VM_ACCOUNT;
 	}
 
 	/*
@@ -1184,14 +1194,6 @@ munmap_back:
 			goto free_vma;
 	}
 
-	/* We set VM_ACCOUNT in a shared mapping's vm_flags, to inform
-	 * shmem_zero_setup (perhaps called through /dev/zero's ->mmap)
-	 * that memory reservation must be checked; but that reservation
-	 * belongs to shared memory object, not to vma: so now clear it.
-	 */
-	if ((vm_flags & (VM_SHARED|VM_ACCOUNT)) == (VM_SHARED|VM_ACCOUNT))
-		vma->vm_flags &= ~VM_ACCOUNT;
-
 	/* Can addr have changed??
 	 *
 	 * Answer: Yes, several device drivers can do it in their
diff --git a/mm/shmem.c b/mm/shmem.c
index 5d0de96..19d566c 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2628,7 +2628,7 @@ struct file *shmem_file_setup(char *name, loff_t size, unsigned long flags)
 		goto close_file;
 
 #ifdef CONFIG_SHMEM
-	SHMEM_I(inode)->flags = flags & VM_ACCOUNT;
+	SHMEM_I(inode)->flags = (flags & VM_NORESERVE) ? 0 : VM_ACCOUNT;
 #endif
 	d_instantiate(dentry, inode);
 	inode->i_size = size;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ