[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4AD1A446.2000602@gmail.com>
Date: Sun, 11 Oct 2009 11:24:22 +0200
From: Jarek Poplawski <jarkao2@...il.com>
To: Jesse Brandeburg <jesse.brandeburg@...il.com>
CC: Tejun Heo <tj@...nel.org>, Frans Pop <elendil@...net.nl>,
Jesse Brandeburg <jesse.brandeburg@...el.com>,
linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
Ingo Molnar <mingo@...e.hu>, hpa@...or.com
Subject: Re: bisect results of MSI-X related panic (help!)
Jesse Brandeburg wrote, On 10/10/2009 02:24 AM:
> On Mon, Sep 14, 2009 at 2:43 AM, Tejun Heo <tj@...nel.org> wrote:
>> Tejun Heo wrote:
>>> Frans Pop wrote:
>>>> Jesse Brandeburg wrote:
>>>>> I've bisected, here is my bisect log, problem is that the commit
>>>>> identified is a merge commit, and *I don't know what to revert to test*.
>>>>> It appears the parent of the merge:
>>>>> 6e15cf04860074ad032e88c306bea656bbdd0f22 is marked good, but looks to be
>>>>> in a possibly related area to the panic.
>>>> That merge does contain quite a few merge fixups, so it's quite possible
>>>> one of them is the cause of the failure.
>>>> Maybe the simplest way to verify that is to compile both parents of the
>>>> merge to doublecheck that they work OK. Then, if a compile of the merge
>>>> itself is bad, the problem really is in the merge commit itself.
>>>>
>>>> That commit is the "percpu" merge, so I've added Tejun (author of most of
>>>> that branch) and Ingo (merger) in CC.
>>> Sorry, the oops doesn't ring a bell, well, not yet at least. It would
>>> be great if the bisection can be narrowed down more.
>> Also, building w/ debug option on, capturing more oops traces and
>> pasting gdb output of l *<oops address> might shed some more light.
>
> Okay, it has been a while and I have an update on this issue. The
> actual panic seems to have disappeared in 2.6.32-rc1(2), however, with
> CONFIG_CC_STACKPROTECTOR=y, I am still panicking, the stack protector
> fault shows only this message, no backtrace is listed:
>
> Kernel stack is corrupted in: ffffffff810b5b31
>
> I've built with a full debug kernel before this crash, so I did:
>
> (gdb) l *0xffffffff810b5b31
> 0xffffffff810b5b31 is in move_native_irq (kernel/irq/migration.c:67).
> 62 return;
> 63
> 64 desc->chip->mask(irq);
> 65 move_masked_irq(irq);
> 66 desc->chip->unmask(irq);
>>>> 67 }
> 68
> (gdb) l move_native_irq
> 54 void move_native_irq(int irq)
> 55 {
> 56 struct irq_desc *desc = irq_to_desc(irq);
> 57
> 58 if (likely(!(desc->status & IRQ_MOVE_PENDING)))
> 59 return;
> 60
> 61 if (unlikely(desc->status & IRQ_DISABLED))
> 62 return;
> 63
> 64 desc->chip->mask(irq);
> 65 move_masked_irq(irq);
> 66 desc->chip->unmask(irq);
> 67 }
>
> So, this seems very related to my panic, as it is likely that
> irqbalance or something else might try to move my interrupt from one
> core to another and this seems likely related, and the original issue
> as well as this one reproduce with LOTS of MSI-X vectors active.
>
> - I tried connecting after the panic with kgdboc, no connection
> - I tried kdump, but the same kernel I am using panics/hangs during
> boot right after udev during the kexec() kernel boot (should I try
> harder to get this working given it got so far?)
> - I have ftrace function tracer running but no way to get at the log
> post panic (wouldn't it be great if the kernel just dumped the ftrace
> log on __stack_chk_fail?)
>
> any other debugging tricks/ideas?
It seems CONFIG_CPUMASK_OFFSTACK (CONFIG_MAXSMP) can change something
around this - did you try?
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists