[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1105091246550.32299@artax.karlin.mff.cuni.cz>
Date: Mon, 9 May 2011 13:01:09 +0200 (CEST)
From: Mikulas Patocka <mikulas@...ax.karlin.mff.cuni.cz>
To: Linus Torvalds <torvalds@...ux-foundation.org>
cc: linux-kernel@...r.kernel.org, linux-parisc@...r.kernel.org,
Hugh Dickins <hughd@...gle.com>,
Oleg Nesterov <oleg@...hat.com>, agk@...hat.com
Subject: Re: [PATCH] Don't mlock guardpage if the stack is growing up
On Sun, 8 May 2011, Linus Torvalds wrote:
> On Sun, May 8, 2011 at 11:55 AM, Mikulas Patocka
> <mikulas@...ax.karlin.mff.cuni.cz> wrote:
> >
> > This patch fixes lvm2 on PA-RISC (and possibly other architectures with
> > up-growing stack). lvm2 calculates the number of used pages when locking
> > and when unlocking and reports an internal error if the numbers mismatch.
>
> This patch won't apply on current kernels (including stable) because
> of commit a1fde08c74e9 that changed the test of "pages" to instead
> just test "flags & FOLL_MLOCK".
>
> That should be trivial to fix up.
>
> However, I really don't much like this complex test:
>
> > static inline int stack_guard_page(struct vm_area_struct *vma, unsigned long addr)
> > {
> > - return (vma->vm_flags & VM_GROWSDOWN) &&
> > + return ((vma->vm_flags & VM_GROWSDOWN) &&
> > (vma->vm_start == addr) &&
> > - !vma_stack_continue(vma->vm_prev, addr);
> > + !vma_stack_continue(vma->vm_prev, addr)) ||
> > + ((vma->vm_flags & VM_GROWSUP) &&
> > + (vma->vm_end == addr + PAGE_SIZE) &&
> > + !vma_stack_growsup_continue(vma->vm_next, addr + PAGE_SIZE));
> > }
>
> in that format. It gets really hard to read, and I think you'd be
> better off writing it as two helper functions (or macros) for the two
> cases, and then have
>
> static inline int stack_guard_page(struct vm_area_struct *vma,
> unsigned long addr)
> {
> return stack_guard_page_growsdown(vma, addr) ||
> stack_guard_page_growsup(vma, addr);
> }
>
> I'd also like to verify that it doesn't actually generate any extra
> code for the common case (iirc VM_GROWSUP is 0 for the architectures
> that don't need it, and so the compiler shouldn't generate any extra
> code, but I'd like that mentioned and verified explicitly).
>
> Hmm?
>
> Other than that it looks ok to me.
>
> That said, could we please fix LVM to not do that crazy sh*t in the
> first place? The STACK_GROWSUP case is never going to have a lot of
> testing, this is just sad.
LVM reads process maps from /proc/self/maps and locks them with mlock.
Why it doesn't use mlockall()? Because glibc maps all locales to the
process. Glibc packs all locales to a 100MB file and maps that file to
every process. Even if the process uses just one locale, glibc maps all.
So, when LVM used mlockall, it consumed >100MB memory and it caused
out-of-memory problems in system installers.
So, alternate way of locking was added to LVM --- read all maps and lock
them, except for the glibc locale file.
The real fix would be to fix glibc not to map 100MB to every process.
> Linus
>
This is updated patch. I tested it on x86-64 and it doesn't change
generated code.
Mikulas
---
Don't lock guardpage if the stack is growing up
Linux kernel excludes guard page when performing mlock on a VMA with
down-growing stack. However, some architectures have up-growing stack
and locking the guard page should be excluded in this case too.
This patch fixes lvm2 on PA-RISC (and possibly other architectures with
up-growing stack). lvm2 calculates number of used pages when locking and
when unlocking and reports an internal error if the numbers mismatch.
Signed-off-by: Mikulas Patocka <mikulas@...ax.karlin.mff.cuni.cz>
---
include/linux/mm.h | 10 +++++++++-
mm/memory.c | 21 ++++++++++++++++++---
2 files changed, 27 insertions(+), 4 deletions(-)
Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h 2011-05-09 12:33:50.000000000 +0200
+++ linux-2.6/include/linux/mm.h 2011-05-09 12:42:05.000000000 +0200
@@ -1011,11 +1011,19 @@ int set_page_dirty_lock(struct page *pag
int clear_page_dirty_for_io(struct page *page);
/* Is the vma a continuation of the stack vma above it? */
-static inline int vma_stack_continue(struct vm_area_struct *vma, unsigned long addr)
+static inline int vma_stack_growsdown_continue(struct vm_area_struct *vma,
+ unsigned long addr)
{
return vma && (vma->vm_end == addr) && (vma->vm_flags & VM_GROWSDOWN);
}
+/* Is the vma a continuation of the stack vma below it? */
+static inline int vma_stack_growsup_continue(struct vm_area_struct *vma,
+ unsigned long addr)
+{
+ return vma && (vma->vm_start == addr) && (vma->vm_flags & VM_GROWSUP);
+}
+
extern unsigned long move_page_tables(struct vm_area_struct *vma,
unsigned long old_addr, struct vm_area_struct *new_vma,
unsigned long new_addr, unsigned long len);
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c 2011-05-09 12:33:51.000000000 +0200
+++ linux-2.6/mm/memory.c 2011-05-09 12:41:38.000000000 +0200
@@ -1410,11 +1410,21 @@ no_page_table:
return page;
}
-static inline int stack_guard_page(struct vm_area_struct *vma, unsigned long addr)
+static inline int stack_guard_page_growsdown(struct vm_area_struct *vma,
+ unsigned long addr)
{
return (vma->vm_flags & VM_GROWSDOWN) &&
(vma->vm_start == addr) &&
- !vma_stack_continue(vma->vm_prev, addr);
+ !vma_stack_growsdown_continue(vma->vm_prev, addr);
+}
+
+
+static inline int stack_guard_page_growsup(struct vm_area_struct *vma,
+ unsigned long addr)
+{
+ return (vma->vm_flags & VM_GROWSUP) &&
+ (vma->vm_end == addr + PAGE_SIZE) &&
+ !vma_stack_growsup_continue(vma->vm_next, addr + PAGE_SIZE);
}
/**
@@ -1554,13 +1564,18 @@ int __get_user_pages(struct task_struct
/*
* For mlock, just skip the stack guard page.
*/
- if ((gup_flags & FOLL_MLOCK) && stack_guard_page(vma, start))
+ if ((gup_flags & FOLL_MLOCK) &&
+ stack_guard_page_growsdown(vma, start))
goto next_page;
do {
struct page *page;
unsigned int foll_flags = gup_flags;
+ if ((gup_flags & FOLL_MLOCK) &&
+ stack_guard_page_growsup(vma, start))
+ goto next_page;
+
/*
* If we have a pending SIGKILL, don't keep faulting
* pages and potentially allocating memory.
Powered by blists - more mailing lists