linux-kernel - Re: NMI watchdog triggering during load

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1425665511.7562.36.camel@gmx.de>
Date:	Fri, 06 Mar 2015 19:11:51 +0100
From:	Mike Galbraith <efault@....de>
To:	David Ahern <david.ahern@...cle.com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: NMI watchdog triggering during load_balance

On Fri, 2015-03-06 at 08:01 -0700, David Ahern wrote:
> On 3/5/15 9:52 PM, Mike Galbraith wrote:
> >> CPU970 attaching sched-domain:
> >>    domain 0: span 968-975 level SIBLING
> >>     groups: 8 single CPU groups
> >>     domain 1: span 968-975 level MC
> >>      groups: 1 group with 8 cpus
> >>      domain 2: span 768-1023 level CPU
> >>       groups: 4 groups with 256 cpus per group
> >
> > Wow, that topology is horrid.  I'm not surprised that your box is
> > writhing in agony.  Can you twiddle that?
> >
> 
> twiddle that how?

That was the question, _do_ you have any control, because that topology
is toxic.  I guess your reply means 'nope'.

> The system has 4 physical cpus (sockets). Each cpu has 32 cores with 8 
> threads per core and each cpu has 4 memory controllers.

Thank god I've never met one of these, looks like the box from hell :)

> If I disable SCHED_MC and CGROUPS_SCHED (group scheduling) there is a 
> noticeable improvement -- watchdog does not trigger and I do not get the 
> rq locks held for 2-3 seconds. But there is still fairly high cpu usage 
> for an idle system. Perhaps I should leave SCHED_MC on and disable 
> SCHED_SMT; I'll try that today.

Well, if you disable SMT,your troubles _should_ shrink radically, as
your box does. You should probably look at why you have CPU domains.
You don't ever want to see that on a NUMA box.

	-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/