[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.00.0810130908150.3288@nehalem.linux-foundation.org>
Date: Mon, 13 Oct 2008 09:08:55 -0700 (PDT)
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Ingo Molnar <mingo@...e.hu>
cc: Karel Zak <kzak@...hat.com>,
Arjan van de Ven <arjan@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Nick Piggin <nickpiggin@...oo.com.au>,
Thomas Gleixner <tglx@...utronix.de>,
"H. Peter Anvin" <hpa@...or.com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [kerneloops] regression in 2.6.27 wrt "lock_page" and the
"hwclock" program
On Mon, 13 Oct 2008, Ingo Molnar wrote:
>
> hm, i think the 64-bit case is the correct code, because in this 'init
> task OOMs' case we do:
>
> out_of_memory:
> up_read(&mm->mmap_sem);
> if (is_global_init(tsk)) {
> yield();
> down_read(&mm->mmap_sem);
>
> note that we drop the mmap_sem, so in theory another thread of this same
> MM could change the vma tree, and our 'vma' might not be valid anymore.
Hmm. Looks about right.
> It's probably not a real issue in practice because this is about PID 1,
> so i doubt it really matters, but still.
>
> So how about the patch below?
Ack. As long as we don't have two versions and the code is impossible to
look at.
Linus
>
> Ingo
>
> ---------------->
> >From 7b87da331b6ada44ccd5ffeedba76880c825d4fc Mon Sep 17 00:00:00 2001
> From: Ingo Molnar <mingo@...e.hu>
> Date: Mon, 13 Oct 2008 17:49:02 +0200
> Subject: [PATCH] x86/mm: unify init task OOM handling
>
> Linus noticed that the "again:" versus "survive:" OOM logic for
> the init task was arbitrarily different.
>
> The 64-bit codepath is the better one, because it correctly re-lookups
> the vma after having dropped the ->mmap_sem.
>
> Signed-off-by: Ingo Molnar <mingo@...e.hu>
> ---
> arch/x86/mm/fault.c | 15 ++++++---------
> 1 files changed, 6 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index ac2ad78..8bc5956 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -671,7 +671,8 @@ void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code)
> goto bad_area_nosemaphore;
>
> again:
> - /* When running in the kernel we expect faults to occur only to
> + /*
> + * When running in the kernel we expect faults to occur only to
> * addresses in user space. All other faults represent errors in the
> * kernel and should generate an OOPS. Unfortunately, in the case of an
> * erroneous fault occurring in a code path which already holds mmap_sem
> @@ -734,9 +735,6 @@ good_area:
> goto bad_area;
> }
>
> -#ifdef CONFIG_X86_32
> -survive:
> -#endif
> /*
> * If for any reason at all we couldn't handle the fault,
> * make sure we exit gracefully rather than endlessly redo
> @@ -871,12 +869,11 @@ out_of_memory:
> up_read(&mm->mmap_sem);
> if (is_global_init(tsk)) {
> yield();
> -#ifdef CONFIG_X86_32
> - down_read(&mm->mmap_sem);
> - goto survive;
> -#else
> + /*
> + * Re-lookup the vma - in theory the vma tree might
> + * have changed:
> + */
> goto again;
> -#endif
> }
>
> printk("VM: killing process %s\n", tsk->comm);
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists