linux-kernel - Re: timer: lockup in run_timer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130709224706.GA13855@redhat.com>
Date:	Tue, 9 Jul 2013 18:47:06 -0400
From:	Dave Jones <davej@...hat.com>
To:	Sasha Levin <sasha.levin@...cle.com>
Cc:	Tejun Heo <tj@...nel.org>, tglx@...utronix.de,
	Peter Zijlstra <peterz@...radead.org>,
	LKML <linux-kernel@...r.kernel.org>, trinity@...r.kernel.org
Subject: Re: timer: lockup in run_timer_softirq()

On Tue, Jul 09, 2013 at 06:35:27PM -0400, Sasha Levin wrote:
 > Hi all,
 > 
 > While fuzzing with trinity inside a KVM tools guest running latest -next, I've
 > stumbled on the following spew:
 > 
 > [ 2536.440007] BUG: soft lockup - CPU#0 stuck for 23s! [trinity-child86:12368]
 > [ 2536.440007] Call Trace:
 > [ 2536.440007]  <IRQ>
 > [ 2536.440007]  [<ffffffff8113d180>] run_timer_softirq+0x2d0/0x330
 > [ 2536.440007]  [<ffffffff8113b400>] ? lock_timer_base+0x70/0x70
 > [ 2536.440007]  [<ffffffff81132b21>] __do_softirq+0x261/0x4d0
 > [ 2536.440007]  [<ffffffff81132ed6>] irq_exit+0x86/0x120
 > [ 2536.440007]  [<ffffffff841a43ea>] smp_apic_timer_interrupt+0x4a/0x60
 > [ 2536.440007]  [<ffffffff841a2d32>] apic_timer_interrupt+0x72/0x80
 > [ 2536.440007]  <EOI>
 > [ 2536.440007]  [<ffffffff84198ef7>] ? retint_restore_args+0x13/0x13
 > [ 2536.440007]  [<ffffffff812355e5>] ? user_enter+0x135/0x150
 > [ 2536.440007]  [<ffffffff81077d6d>] syscall_trace_leave+0x12d/0x160
 > [ 2536.440007]  [<ffffffff841a2337>] int_check_syscall_exit_work+0x34/0x3d
 > [ 2536.440007] Code: 01 fd 48 89 df e8 45 5b 8f fd e8 b0 f1 00 fd 48 83 3d 30 e6 8a 01 00 75 0e 0f 
 > 0b 0f 1f 40 00 eb fe 66 0f 1f 44 00 00 fb 66 66 90 <66> 66 90 bf 01 00 00 00 e8 17 4f 00 00 65 48 8b 
 > 04 25 88 d9 00
 > 
 > While going through the NMI dump, I noticed that it's very incomplete, and full of:
 > 
 > [ 2536.500130] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 
 > 697182.008 msecs

I've been reporting these (and other traces) for a while https://lkml.org/lkml/2013/7/5/185 [1]
on bare-metal.

I've not managed to find time yet to try and narrow down the exact combination
of trinity syscalls that causes this. Sometimes it happens quickly, sometimes
after a few hours (even when run with the same seeds, it seems variable)

	Dave
 
[1] lkml.org seems to be down right now. again. Sigh.
  See
  Subject: Yet more softlockups.
   and
  Subject: scheduling while atomic & hang


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/