linux-kernel - Re: timer: lockup in run_timer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <51DC9379.9050408@oracle.com>
Date:	Tue, 09 Jul 2013 18:49:29 -0400
From:	Sasha Levin <sasha.levin@...cle.com>
To:	Dave Jones <davej@...hat.com>, Tejun Heo <tj@...nel.org>,
	tglx@...utronix.de, Peter Zijlstra <peterz@...radead.org>,
	LKML <linux-kernel@...r.kernel.org>, trinity@...r.kernel.org
Subject: Re: timer: lockup in run_timer_softirq()

On 07/09/2013 06:47 PM, Dave Jones wrote:
> On Tue, Jul 09, 2013 at 06:35:27PM -0400, Sasha Levin wrote:
>   > Hi all,
>   >
>   > While fuzzing with trinity inside a KVM tools guest running latest -next, I've
>   > stumbled on the following spew:
>   >
>   > [ 2536.440007] BUG: soft lockup - CPU#0 stuck for 23s! [trinity-child86:12368]
>   > [ 2536.440007] Call Trace:
>   > [ 2536.440007]  <IRQ>
>   > [ 2536.440007]  [<ffffffff8113d180>] run_timer_softirq+0x2d0/0x330
>   > [ 2536.440007]  [<ffffffff8113b400>] ? lock_timer_base+0x70/0x70
>   > [ 2536.440007]  [<ffffffff81132b21>] __do_softirq+0x261/0x4d0
>   > [ 2536.440007]  [<ffffffff81132ed6>] irq_exit+0x86/0x120
>   > [ 2536.440007]  [<ffffffff841a43ea>] smp_apic_timer_interrupt+0x4a/0x60
>   > [ 2536.440007]  [<ffffffff841a2d32>] apic_timer_interrupt+0x72/0x80
>   > [ 2536.440007]  <EOI>
>   > [ 2536.440007]  [<ffffffff84198ef7>] ? retint_restore_args+0x13/0x13
>   > [ 2536.440007]  [<ffffffff812355e5>] ? user_enter+0x135/0x150
>   > [ 2536.440007]  [<ffffffff81077d6d>] syscall_trace_leave+0x12d/0x160
>   > [ 2536.440007]  [<ffffffff841a2337>] int_check_syscall_exit_work+0x34/0x3d
>   > [ 2536.440007] Code: 01 fd 48 89 df e8 45 5b 8f fd e8 b0 f1 00 fd 48 83 3d 30 e6 8a 01 00 75 0e 0f
>   > 0b 0f 1f 40 00 eb fe 66 0f 1f 44 00 00 fb 66 66 90 <66> 66 90 bf 01 00 00 00 e8 17 4f 00 00 65 48 8b
>   > 04 25 88 d9 00
>   >
>   > While going through the NMI dump, I noticed that it's very incomplete, and full of:
>   >
>   > [ 2536.500130] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run:
>   > 697182.008 msecs
>
> I've been reporting these (and other traces) for a while https://lkml.org/lkml/2013/7/5/185 [1]
> on bare-metal.
>
> I've not managed to find time yet to try and narrow down the exact combination
> of trinity syscalls that causes this. Sometimes it happens quickly, sometimes
> after a few hours (even when run with the same seeds, it seems variable)

Interesting. It's the first time I'm seeing these, but I haven't really changed
anything in my configuration.

Are you also seeing the "NMI handler took too long" messages?


Thanks,
Sasha

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/