linux-kernel - Re: [RFC] sched: unused cpu in affine workload

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 4 Apr 2016 10:44:16 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Jiri Olsa <jolsa@...hat.com>
Cc:	Ingo Molnar <mingo@...nel.org>,
	James Hartsock <hartsjc@...hat.com>,
	Rik van Riel <riel@...hat.com>,
	Srivatsa Vaddagiri <vatsa@...ux.vnet.ibm.com>,
	Kirill Tkhai <ktkhai@...allels.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC] sched: unused cpu in affine workload

On Mon, Apr 04, 2016 at 10:23:02AM +0200, Jiri Olsa wrote:
> hi,
> we've noticed following issue in one of our workloads.
> 
> I have 24 CPUs server with following sched domains:
>   domain 0: (pairs)
>   domain 1: 0-5,12-17 (group1)  6-11,18-23 (group2)
>   domain 2: 0-23 level NUMA
> 
> I run CPU hogging workload on following CPUs:
>   4,6,14,18,19,20,23
> 
> that is:
>   4,14          CPUs from group1
>   6,18,19,20,23 CPUs from group2
> 
> the workload process gets affinity setup via 'taskset -c ${CPUs workload ...'
> and forks child for every CPU
> 
> very often we notice CPUs 4 and 14 running 3 processes of the workload
> while CPUs 6,18,19,20,23 running just 4 processes, leaving one of the
> CPU from group2 idle
> 
> AFAICS from the code the reason for this is that the load balancing
> follows domains setup (topology) and does not regard affinity setups
> like this. The code in find_busiest_group running under idle cpu from
> group2 will find group1 as bussiest, but its average load will be
> smaller than the one on the local group, so there's no task pulling.
> 
> It's obvious, that load balancer follows sched domain topology.
> However is there some sched feature I'm missing that could help
> with this? Or do we need to follow sched domains topology when
> we select CPUs for workload to get even balancing?

Yeah, this is 'hard', there is some code that tries not to totally blow
with this but its all a bit of a mess. See
kernel/sched/fair.c:sg_imbalanced().

The easiest solution is to simply not do this and stick with the topo
like you suggest.

So far I've not come up with a sane/stable solution for this problem.