[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.00.0901311010460.3150@localhost.localdomain>
Date: Sat, 31 Jan 2009 10:34:30 -0800 (PST)
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Hugh Dickins <hugh@...itas.com>
cc: Lee Schermerhorn <Lee.Schermerhorn@...com>,
Greg KH <gregkh@...e.de>,
Maksim Yevmenkin <maksim.yevmenkin@...il.com>,
linux-kernel <linux-kernel@...r.kernel.org>,
Nick Piggin <npiggin@...e.de>,
Andrew Morton <akpm@...ux-foundation.org>,
will@...wder-design.com, Rik van Riel <riel@...hat.com>,
KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
Mikos Szeredi <miklos@...redi.hu>
Subject: Re: [PATCH] Fix OOPS in mmap_region() when merging adjacent VM_LOCKED
file segments
On Sat, 31 Jan 2009, Hugh Dickins wrote:
>
> We need a way to communicate not-MAP_NORESERVE to shmem.c, and we don't
> just need it in the explicit shmem_zero_setup() case, we also need it
> for the (probably rare nowadays) case when mmap() is working on file
> /dev/zero (drivers/char/mem.c mmap_zero()), rather than using MAP_ANON.
Heh. We already have that. Maybe you didn't realize. Look at VM_NORESERVE.
So shmem.c can just look at "!(vma->vm_flags & VM_NORESERVE)" if it wants
to.
The only piece of information you don't have is that "accountable" flag,
but that was why I was pointing out how totally _useless_ that flag
actually is. We shouldn't pass it along, because it's always 1, except for
one special case that we can always calculate (private hugepages).
But in fact, even leaving that untouched, we can trivially just change
what 'VM_NORESERVE' means: we can just make it the end result of all that
'accountable' logic, instead of having it just mirror MAP_NORESERVE
blindly.
> Still haven't decided what's best to do about it (plenty of diversions):
> perhaps we just say my error was to overload VM_ACCOUNT, and define a
> new flag for the purpose, which can go into VM_MERGEABLE_FLAGS; but
> I'd prefer a neater solution if it crosses my mind.
How about this pretty trivial patch?
TOTALLY UNTESTED. As usual. But the concept is pretty simple, and it
actually removes a fair chunk of hacky code. The only reason the diffstat
output says that it adds more lines than it deletes is that I added more
comments and made that helper inline function rather than make a complex
conditional.
Whaddaya think?
Linus
---
mm/mmap.c | 48 +++++++++++++++++++++++++-----------------------
mm/shmem.c | 2 +-
2 files changed, 26 insertions(+), 24 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index c581df1..5fcaec3 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1090,6 +1090,15 @@ int vma_wants_writenotify(struct vm_area_struct *vma)
mapping_cap_account_dirty(vma->vm_file->f_mapping);
}
+/*
+ * We account for memory if it's a private writeable mapping,
+ * and VM_NORESERVE wasn't set.
+ */
+static inline int private_accountable_mapping(unsigned int vm_flags)
+{
+ return (vm_flags & (VM_NORESERVE | VM_SHARED | VM_WRITE)) == VM_WRITE;
+}
+
unsigned long mmap_region(struct file *file, unsigned long addr,
unsigned long len, unsigned long flags,
unsigned int vm_flags, unsigned long pgoff,
@@ -1117,23 +1126,24 @@ munmap_back:
if (!may_expand_vm(mm, len >> PAGE_SHIFT))
return -ENOMEM;
- if (flags & MAP_NORESERVE)
+ /*
+ * Set 'VM_NORESERVE' if we should not account for the
+ * memory use of this mapping. We only honor MAP_NORESERVE
+ * if we're allowed to overcommit memory.
+ */
+ if ((flags & MAP_NORESERVE) && sysctl_overcommit_memory != OVERCOMMIT_NEVER)
+ vm_flags |= VM_NORESERVE;
+ if (!accountable)
vm_flags |= VM_NORESERVE;
- if (accountable && (!(flags & MAP_NORESERVE) ||
- sysctl_overcommit_memory == OVERCOMMIT_NEVER)) {
- if (vm_flags & VM_SHARED) {
- /* Check memory availability in shmem_file_setup? */
- vm_flags |= VM_ACCOUNT;
- } else if (vm_flags & VM_WRITE) {
- /*
- * Private writable mapping: check memory availability
- */
- charged = len >> PAGE_SHIFT;
- if (security_vm_enough_memory(charged))
- return -ENOMEM;
- vm_flags |= VM_ACCOUNT;
- }
+ /*
+ * Private writable mapping: check memory availability
+ */
+ if (private_accountable_mapping(vm_flags)) {
+ charged = len >> PAGE_SHIFT;
+ if (security_vm_enough_memory(charged))
+ return -ENOMEM;
+ vm_flags |= VM_ACCOUNT;
}
/*
@@ -1184,14 +1194,6 @@ munmap_back:
goto free_vma;
}
- /* We set VM_ACCOUNT in a shared mapping's vm_flags, to inform
- * shmem_zero_setup (perhaps called through /dev/zero's ->mmap)
- * that memory reservation must be checked; but that reservation
- * belongs to shared memory object, not to vma: so now clear it.
- */
- if ((vm_flags & (VM_SHARED|VM_ACCOUNT)) == (VM_SHARED|VM_ACCOUNT))
- vma->vm_flags &= ~VM_ACCOUNT;
-
/* Can addr have changed??
*
* Answer: Yes, several device drivers can do it in their
diff --git a/mm/shmem.c b/mm/shmem.c
index 5d0de96..19d566c 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2628,7 +2628,7 @@ struct file *shmem_file_setup(char *name, loff_t size, unsigned long flags)
goto close_file;
#ifdef CONFIG_SHMEM
- SHMEM_I(inode)->flags = flags & VM_ACCOUNT;
+ SHMEM_I(inode)->flags = (flags & VM_NORESERVE) ? 0 : VM_ACCOUNT;
#endif
d_instantiate(dentry, inode);
inode->i_size = size;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists