[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87prq79kty.fsf@skyscraper.fehenstaub.lan>
Date: Tue, 24 Jun 2008 12:41:13 +0200
From: Johannes Weiner <hannes@...urebad.de>
To: Vegard Nossum <vegard.nossum@...il.com>
Cc: a.p.zijlstra@...llo.nl, arjan@...ux.intel.com,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] softirq softlockup debugging
Hi Vegard,
Vegard Nossum <vegard.nossum@...il.com> writes:
> Hi,
>
> I'm debugging a problem with a softirq that gets stuck for a long time,
> so I wrote this patch to help find out what's going wrong.
>
> I actually think it can be useful in general as well, see for example
> http://www.kerneloops.org/search.php?search=__do_softirq&btnG=Function+Search
>
> ..and these cases are virtually impossible to debug since we don't know
> anything about *what* it was that got stuck. (The NMI watchdog could
> help, though.)
>
> The patch is #ifdef-ugly, I know... Suggestions are welcome.
>
>
> Vegard
>
>
> From: Vegard Nossum <vegard.nossum@...il.com>
> Date: Sun, 22 Jun 2008 14:12:31 +0200
> Subject: [PATCH] softirq softlockup debugging
>
>>>From the Kconfig: If a softlockup happens in a softirq, the softlockup
> stack trace is utterly unhelpful; it will only show the stack up to
> __do_softirq(), since this is where interrupts are reenabled.
After more staring at the code in question, I think that the approach is
not correct (or I didn't understand it, which is not unlikely).
I hunted down the address of the traces from kerneloops.org
(__do_softirq+0x6d) on a kernel image with a fedora config and it's at
the local_irq_enable() right after the restart:label in __do_softirq().
So if the softirq handler had disabled interrupts, the softlockup would
have been detected still within the handler (when it reenables irqs and
the timer irq runs) and the stackframe should be there.
do_softirq()
local_irq_save() 1)
local_softirq_pending()
__do_softirq()
restart: 2)
local_irq_enable() 3)
run a handler
local_irq_disable() 4)
jnz restart
So the lockup must be caused somewhere
between 1) and 3)
or
between 4) and 3) [when we jump back]
These functions are in the path and possible candidates for causing it:
- local_softirq_pending()
- account_system_vtime()
- __local_bh_disable()
- trace_softirq_enter()
- smp_processor_id()
- set_softirq_pending()
What do you think? You said you actually used your patch already for
debugging lockups in softirq handlers, so it confuses me why the
stackframe of the handler was no longer present.
Hannes
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists