lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 24 Jun 2008 12:41:13 +0200
From:	Johannes Weiner <hannes@...urebad.de>
To:	Vegard Nossum <vegard.nossum@...il.com>
Cc:	a.p.zijlstra@...llo.nl, arjan@...ux.intel.com,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] softirq softlockup debugging

Hi Vegard,

Vegard Nossum <vegard.nossum@...il.com> writes:

> Hi,
>
> I'm debugging a problem with a softirq that gets stuck for a long time,
> so I wrote this patch to help find out what's going wrong.
>
> I actually think it can be useful in general as well, see for example
> http://www.kerneloops.org/search.php?search=__do_softirq&btnG=Function+Search
>
> ..and these cases are virtually impossible to debug since we don't know
> anything about *what* it was that got stuck. (The NMI watchdog could
> help, though.)
>
> The patch is #ifdef-ugly, I know... Suggestions are welcome.
>
>
> Vegard
>
>
> From: Vegard Nossum <vegard.nossum@...il.com>
> Date: Sun, 22 Jun 2008 14:12:31 +0200
> Subject: [PATCH] softirq softlockup debugging
>
>>>From the Kconfig: If a softlockup happens in a softirq, the softlockup
> stack trace is utterly unhelpful; it will only show the stack up to
> __do_softirq(), since this is where interrupts are reenabled.

After more staring at the code in question, I think that the approach is
not correct (or I didn't understand it, which is not unlikely).

I hunted down the address of the traces from kerneloops.org
(__do_softirq+0x6d) on a kernel image with a fedora config and it's at
the local_irq_enable() right after the restart:label in __do_softirq().

So if the softirq handler had disabled interrupts, the softlockup would
have been detected still within the handler (when it reenables irqs and
the timer irq runs) and the stackframe should be there.

do_softirq()
  local_irq_save()			1)
  local_softirq_pending()
  __do_softirq()
   restart:				2)
    local_irq_enable()			3)
    run a handler
    local_irq_disable()			4)
    jnz restart

So the lockup must be caused somewhere
  between 1) and 3)
or
  between 4) and 3) [when we jump back]

These functions are in the path and possible candidates for causing it:

- local_softirq_pending()
- account_system_vtime()
- __local_bh_disable()
- trace_softirq_enter()
- smp_processor_id()
- set_softirq_pending()

What do you think?  You said you actually used your patch already for
debugging lockups in softirq handlers, so it confuses me why the
stackframe of the handler was no longer present.

	Hannes
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ