lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 1 Apr 2015 14:43:36 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	Chris J Arges <chris.j.arges@...onical.com>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Rafael David Tinoco <inaddy@...ntu.com>,
	Peter Anvin <hpa@...or.com>,
	Jiang Liu <jiang.liu@...ux.intel.com>,
	Peter Zijlstra <peterz@...radead.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Jens Axboe <axboe@...nel.dk>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Gema Gomez <gema.gomez-solano@...onical.com>,
	the arch/x86 maintainers <x86@...nel.org>
Subject: Re: smp_call_function_single lockups


* Chris J Arges <chris.j.arges@...onical.com> wrote:

> Linus,
> 
> I had a few runs with your patch plus modifications, and got the following
> results (modified patch inlined below):
> 
> [   14.423916] ack_APIC_irq: vector = d1, irq = ffffffff
> [  176.060005] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:1630]
> 
> [   17.995298] ack_APIC_irq: vector = d1, irq = ffffffff
> [  182.993828] ack_APIC_irq: vector = e1, irq = ffffffff
> [  202.919691] ack_APIC_irq: vector = 22, irq = ffffffff
> [  484.132006] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [qemu-system-x86:1586]
> 
> [   15.592032] ack_APIC_irq: vector = d1, irq = ffffffff
> [  304.993490] ack_APIC_irq: vector = e1, irq = ffffffff
> [  315.174755] ack_APIC_irq: vector = 22, irq = ffffffff
> [  360.108007] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ksmd:26]
> 
> [   15.026077] ack_APIC_irq: vector = b1, irq = ffffffff
> [  374.828531] ack_APIC_irq: vector = c1, irq = ffffffff
> [  402.965942] ack_APIC_irq: vector = d1, irq = ffffffff
> [  434.540814] ack_APIC_irq: vector = e1, irq = ffffffff
> [  461.820768] ack_APIC_irq: vector = 22, irq = ffffffff
> [  536.120027] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:4243]
> 
> [   17.889334] ack_APIC_irq: vector = d1, irq = ffffffff
> [  291.888784] ack_APIC_irq: vector = e1, irq = ffffffff
> [  297.824627] ack_APIC_irq: vector = 22, irq = ffffffff
> [  336.960594] ack_APIC_irq: vector = 42, irq = ffffffff
> [  367.012706] ack_APIC_irq: vector = 52, irq = ffffffff
> [  377.025090] ack_APIC_irq: vector = 62, irq = ffffffff
> [  417.088773] ack_APIC_irq: vector = 72, irq = ffffffff
> [  447.136788] ack_APIC_irq: vector = 82, irq = ffffffff
> -- stopped it since it wasn't reproducing / I was impatient --
> 
> So I'm seeing irq == VECTOR_UNDEFINED in all of these cases. Making
> (vector >= 0) didn't seem to expose any additional vectors.

So, these vectors do seem to be lining up with the pattern of how new 
irq vectors get assigned and how we slowly rotate through all 
available ones.

The VECTOR_UNDEFINED might correspond to the fact that we already 
'freed' that vector, as part of the irq-move mechanism - but it was 
likely in use shortly before.

So the irq-move code is not off the hook, to the contrary.

Have you already tested whether the hang goes away if you remove 
irq-affinity fiddling daemons from the system? Do you have irqbalance 
installed or similar mechanisms?

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ