[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0701050827290.3661@woody.osdl.org>
Date: Fri, 5 Jan 2007 08:49:24 -0800 (PST)
From: Linus Torvalds <torvalds@...l.org>
To: Alistair John Strachan <s0348365@....ed.ac.uk>
cc: Mikael Pettersson <mikpe@...uu.se>, 76306.1226@...puserve.com,
akpm@...l.org, bunk@...sta.de, greg@...ah.com,
linux-kernel@...r.kernel.org, yanmin_zhang@...ux.intel.com
Subject: Re: kernel + gcc 4.1 = several problems
On Fri, 5 Jan 2007, Alistair John Strachan wrote:
>
> (I realise with problems like these it's almost always some sort of obscure
> hardware problem, but I find that very difficult to believe when I can toggle
> from 3 years of stability to 6-18 hours crashing by switching compiler. I've
> also ran extensive stability test programs on the hardware with absolutely no
> negative results.)
The thing is, I agree with you - it does seem to be compiler-related. But
at the same time, I'm almost positive that it's not in "pipe_poll()"
itself, because that function is just too simple, and looking at the
assembly code, I don't see how what you describe could happen in THAT
function.
HOWEVER.
I can easily see an NMI coming in, or another interrupt, or something, and
that one corrupting the stack under it because of a compiler bug (or a
kernel bug that just needs a specific compiler to trigger). For example,
we've had problems before with the compiler thinking it owns the stack
frame for an "asmlinkage" function, and us having no way to tell the
compiler to keep its hands off - so the compiler ended up touching
registers that were actually in the "save area" of the interrupt or system
call, and then returning with corrupted state.
Here's a stupid patch. It just adds more debugging to the oops message,
and shows all the code pointers it can find on the WHOLE stack.
It also makes the raw stack dumping print out as much of the stack
contents _under_ the stack pointer as it does above it too.
However, this patch is mostly useless if you have a separate stack for
IRQ's (since if that happens, any interrupt will be taken on a different
stack which we don't see any more), so you should NOT enable the 4KSTACKS
config option if you try this out.
I'm not sure how enlightening any of the output might be, but it is
probably worth trying.
Linus
---
diff --git a/arch/i386/kernel/traps.c b/arch/i386/kernel/traps.c
index 0efad8a..2359eed 100644
--- a/arch/i386/kernel/traps.c
+++ b/arch/i386/kernel/traps.c
@@ -243,6 +243,20 @@ void show_trace(struct task_struct *task, struct pt_regs *regs,
show_trace_log_lvl(task, regs, stack, "");
}
+static void show_all_stack_addresses(unsigned long *esp)
+{
+ struct thread_info *tinfo = (void *) ((unsigned long)esp & (~(THREAD_SIZE - 1)));
+ unsigned long *stack = (unsigned long *)(tinfo+1);
+
+ printk("All stack code pointers:\n");
+ while (valid_stack_ptr(tinfo, stack)) {
+ unsigned long addr = *stack++;
+ if (__kernel_text_address(addr))
+ print_symbol(" %s", addr);
+ }
+ printk("\n");
+}
+
static void show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
unsigned long *esp, char *log_lvl)
{
@@ -256,8 +270,10 @@ static void show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
esp = (unsigned long *)&esp;
}
+ show_all_stack_addresses(esp);
stack = esp;
- for(i = 0; i < kstack_depth_to_print; i++) {
+ stack -= kstack_depth_to_print;
+ for(i = 0; i < 2*kstack_depth_to_print; i++) {
if (kstack_end(stack))
break;
if (i && ((i % 8) == 0))
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists