linux-kernel - Re: [kerneloops] regression in 2.6.27 wrt "lock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20081013160259.GA26866@elte.hu>
Date:	Mon, 13 Oct 2008 18:02:59 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Karel Zak <kzak@...hat.com>,
	Arjan van de Ven <arjan@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Nick Piggin <nickpiggin@...oo.com.au>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [kerneloops] regression in 2.6.27 wrt "lock_page" and the
	"hwclock" program


* Linus Torvalds <torvalds@...ux-foundation.org> wrote:

> 
> 
> On Mon, 13 Oct 2008, Ingo Molnar wrote:
> > 
> > do you agree with the changelog and can i add your Signed-off-by ?
> 
> Sure. One thing I'd still like to see is that crazy "again" vs 
> "survive" mess for x86-64 vs x86-32. I think the patch as posted will 
> cause a new warning on x86-32 due to "unused label 'again'" or 
> similar.
> 
> It's totally insane that we have two different versions of the oom 
> handling for x86. I don't know why we do that, it's probably 
> historical, and I _suspect_ that the 32-bit one has gotten a lot more 
> testing.
> 
> And not just because there have been more of the 32-bit kernels 
> around, but also because low-memory situations are probably more 
> common on 32-bit setups. But I dunno.
> 
> So I would suggest you just pick the x86-32 version of that oom 
> handling thing too. Unless you know some deep reason why the 64-bit 
> one would be superior.

hm, i think the 64-bit case is the correct code, because in this 'init 
task OOMs' case we do:

out_of_memory:
        up_read(&mm->mmap_sem);
        if (is_global_init(tsk)) {
                yield();
                down_read(&mm->mmap_sem);

note that we drop the mmap_sem, so in theory another thread of this same 
MM could change the vma tree, and our 'vma' might not be valid anymore.

It's probably not a real issue in practice because this is about PID 1, 
so i doubt it really matters, but still.

So how about the patch below?

	Ingo

---------------->
>From 7b87da331b6ada44ccd5ffeedba76880c825d4fc Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@...e.hu>
Date: Mon, 13 Oct 2008 17:49:02 +0200
Subject: [PATCH] x86/mm: unify init task OOM handling

Linus noticed that the "again:" versus "survive:" OOM logic for
the init task was arbitrarily different.

The 64-bit codepath is the better one, because it correctly re-lookups
the vma after having dropped the ->mmap_sem.

Signed-off-by: Ingo Molnar <mingo@...e.hu>
---
 arch/x86/mm/fault.c |   15 ++++++---------
 1 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index ac2ad78..8bc5956 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -671,7 +671,8 @@ void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code)
 		goto bad_area_nosemaphore;
 
 again:
-	/* When running in the kernel we expect faults to occur only to
+	/*
+	 * When running in the kernel we expect faults to occur only to
 	 * addresses in user space.  All other faults represent errors in the
 	 * kernel and should generate an OOPS.  Unfortunately, in the case of an
 	 * erroneous fault occurring in a code path which already holds mmap_sem
@@ -734,9 +735,6 @@ good_area:
 			goto bad_area;
 	}
 
-#ifdef CONFIG_X86_32
-survive:
-#endif
 	/*
 	 * If for any reason at all we couldn't handle the fault,
 	 * make sure we exit gracefully rather than endlessly redo
@@ -871,12 +869,11 @@ out_of_memory:
 	up_read(&mm->mmap_sem);
 	if (is_global_init(tsk)) {
 		yield();
-#ifdef CONFIG_X86_32
-		down_read(&mm->mmap_sem);
-		goto survive;
-#else
+		/*
+		 * Re-lookup the vma - in theory the vma tree might
+		 * have changed:
+		 */
 		goto again;
-#endif
 	}
 
 	printk("VM: killing process %s\n", tsk->comm);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/