linux-kernel - Re: NMI watchdog triggering during load

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1425617559.16821.36.camel@gmx.de>
Date:	Fri, 06 Mar 2015 05:52:39 +0100
From:	Mike Galbraith <efault@....de>
To:	David Ahern <david.ahern@...cle.com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: NMI watchdog triggering during load_balance

On Thu, 2015-03-05 at 21:05 -0700, David Ahern wrote:
> Hi Peter/Mike/Ingo:
> 
> I've been banging my against this wall for a week now and hoping you or 
> someone could shed some light on the problem.
> 
> On larger systems (256 to 1024 cpus) there are several use cases (e.g., 
> http://www.cs.virginia.edu/stream/) that regularly trigger the NMI 
> watchdog with the stack trace:
> 
> Call Trace:
> @  [000000000045d3d0] double_rq_lock+0x4c/0x68
> @  [00000000004699c4] load_balance+0x278/0x740
> @  [00000000008a7b88] __schedule+0x378/0x8e4
> @  [00000000008a852c] schedule+0x68/0x78
> @  [000000000042c82c] cpu_idle+0x14c/0x18c
> @  [00000000008a3a50] after_lock_tlb+0x1b4/0x1cc
> 
> Capturing data for all CPUs I tend to see load_balance related stack 
> traces on 700-800 cpus, with a few hundred blocked on _raw_spin_trylock_bh.
> 
> I originally thought it was a deadlock in the rq locking, but if I bump 
> the watchdog timeout the system eventually recovers (after 10-30+ 
> seconds of unresponsiveness) so it does not seem likely to be a deadlock.
> 
> This particluar system has 1024 cpus:
> # lscpu
> Architecture:          sparc64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Big Endian
> CPU(s):                1024
> On-line CPU(s) list:   0-1023
> Thread(s) per core:    8
> Core(s) per socket:    4
> Socket(s):             32
> NUMA node(s):          4
> NUMA node0 CPU(s):     0-255
> NUMA node1 CPU(s):     256-511
> NUMA node2 CPU(s):     512-767
> NUMA node3 CPU(s):     768-1023
> 
> and there are 4 scheduling domains. An example of the domain debug 
> output (condensed for the email):
> 
> CPU970 attaching sched-domain:
>   domain 0: span 968-975 level SIBLING
>    groups: 8 single CPU groups
>    domain 1: span 968-975 level MC
>     groups: 1 group with 8 cpus
>     domain 2: span 768-1023 level CPU
>      groups: 4 groups with 256 cpus per group

Wow, that topology is horrid.  I'm not surprised that your box is
writhing in agony.  Can you twiddle that?

	-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/