[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0704061503310.31796@twin.jikos.cz>
Date: Fri, 6 Apr 2007 15:23:05 +0200 (CEST)
From: Jiri Kosina <jikos@...os.cz>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
cc: Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org, e1000-devel@...ts.sourceforge.net,
Len Brown <len.brown@...el.com>,
Natalie Protasevich <nataliep@...gle.com>,
Andi Kleen <ak@...e.de>,
Michal Piotrowski <michal.k.k.piotrowski@...il.com>,
auke-jan.h.kok@...el.com
Subject: Re: 2.6.21-rc5-mm4
On Wed, 4 Apr 2007, Eric W. Biederman wrote:
> > And the bisection winner is
> >
> > i386-irq-kill-nr_irq_vectors-and-increase-nr_irqs.patch
> >
> > I don't immediately see how it could be causing it, so adding CCs which
> > are listed in the patch.
> Weird. I will have to look at that in a little more detail.
> Do you know if this problem happens on x86_64? What does your .config
> look like? What does /proc/interrupts look like? What kind of hardware
> you running this kernel on? Can anyone else reproduce this?
> The oops clearly shows something using -1 and calling that as an
> address I don't know why, but I'm guessing I have triggered a memory
> stomp somewhere. I think this is the first time I have seen a small
> negative number causing a NULL pointer dereference.
> That patch looks innocuous enough that either:
> - I just missed changing something I should have.
> - Your configuration has an increase in NR_IRQS and that triggered
> something.
> - The patch simply permuted things so a memory stomp now happens
> on the e1000 data structures instead of somewhere else.
> - Something doesn't like large irq numbers.
> This work is essentially a backport from x86_64 so if your hardware
> is 64bit capable testing that should be a fairly easy test, and be
> able to rule out large irq numbers as the culprit.
> Until I get a good look at -mm I'm going to have a hard time guessing.
> But a roving memory stomp is my best guess.
Hi Eric,
after struggling with this issue for some time, I think that it's just
some incosistent usage of NR_IRQS throughout the source probably due to
some include hell. I really don't understand the how the mach-*/ includes
are supposed to work.
I found out (by disassembling resulting vmlinux binaries) that in
arch/i386/kernel/entry.S, the loop in irq_entries_start does too little
iterations compared to NR_IRQS value as seen in for example io_apic.c
The super-stupid proof-patch below fixes the panic on my system. It's just
to demonstrate that the i386 includes really need fixing to be consistent
somehow.
diff --git a/arch/i386/kernel/entry.S b/arch/i386/kernel/entry.S
index 976438c..b20dc07 100644
--- a/arch/i386/kernel/entry.S
+++ b/arch/i386/kernel/entry.S
@@ -53,6 +53,8 @@
#include <asm/dwarf2.h>
#include "irq_vectors.h"
+#define NR_IRQS 4096
+
/*
* We use macros for low-level operations which need to be overridden
* for paravirtualization. The following will never clobber any registers:
--
Jiri Kosina
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists