linux-kernel - Re: sched: tweak select_idle_sibling to look for idle threads

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 5 May 2016 23:03:06 +0100
From:	Matt Fleming <matt@...eblueprint.co.uk>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Chris Mason <clm@...com>, Mike Galbraith <mgalbraith@...e.de>,
	Ingo Molnar <mingo@...nel.org>, linux-kernel@...r.kernel.org
Subject: Re: sched: tweak select_idle_sibling to look for idle threads

On Wed, 04 May, at 12:37:01PM, Peter Zijlstra wrote:
> 
> tbench wants select_idle_siblings() to just not exist; it goes happy
> when you just return target.

I've been playing with this patch a little bit by hitting it with
tbench on a Xeon, 12 cores with HT enabled, 2 sockets (48 cpus).

I see a throughput improvement for 16, 32, 64, 128 and 256 clients
when compared against mainline, so that's,

  OLD_IDLE, ORDER_IDLE, NO_IDLE_CORE, NO_IDLE_CPU, NO_IDLE_SMT

  vs.

  NO_OLD_IDLE, NO_ORDER_IDLE, IDLE_CORE, IDLE_CPU, IDLE_SMT

See,


 [OLD] Throughput 5345.6 MB/sec   16 clients  16 procs  max_latency=0.277 ms avg_latency=0.211853 ms
 [NEW] Throughput 5514.52 MB/sec  16 clients  16 procs  max_latency=0.493 ms avg_latency=0.176441 ms
 
 [OLD] Throughput 7401.76 MB/sec  32 clients  32 procs  max_latency=1.804 ms avg_latency=0.451147 ms
 [NEW] Throughput 10044.9 MB/sec  32 clients  32 procs  max_latency=3.421 ms avg_latency=0.582529 ms
 
 [OLD] Throughput 13265.9 MB/sec  64 clients  64 procs  max_latency=7.395 ms avg_latency=0.927147 ms
 [NEW] Throughput 13929.6 MB/sec  64 clients  64 procs  max_latency=7.022 ms avg_latency=1.017059 ms
 
 [OLD] Throughput 12827.8 MB/sec  128 clients  128 procs  max_latency=16.256 ms avg_latency=2.763706 ms
 [NEW] Throughput 13364.2 MB/sec  128 clients  128 procs  max_latency=16.630 ms avg_latency=3.002971 ms
 
 [OLD] Throughput 12653.1 MB/sec  256 clients  256 procs  max_latency=44.722 ms avg_latency=5.741647 ms
 [NEW] Throughput 12965.7 MB/sec  256 clients  256 procs  max_latency=59.061 ms avg_latency=8.699118 ms


For throughput changes to 1, 2, 4 and 8 clients it's more of a mixture
with sometimes the old config winning and sometimes losing.


 [OLD] Throughput 488.819 MB/sec  1 clients  1 procs  max_latency=0.191 ms avg_latency=0.058794 ms
 [NEW] Throughput 486.106 MB/sec  1 clients  1 procs  max_latency=0.085 ms avg_latency=0.045794 ms
 
 [OLD] Throughput 925.987 MB/sec  2 clients  2 procs  max_latency=0.201 ms avg_latency=0.090882 ms
 [NEW] Throughput 954.944 MB/sec  2 clients  2 procs  max_latency=0.199 ms avg_latency=0.064294 ms
 
 [OLD] Throughput 1764.02 MB/sec  4 clients  4 procs  max_latency=0.160 ms avg_latency=0.075206 ms
 [NEW] Throughput 1756.8 MB/sec   4 clients  4 procs  max_latency=0.105 ms avg_latency=0.062382 ms
 
 [OLD] Throughput 3384.22 MB/sec  8 clients  8 procs  max_latency=0.276 ms avg_latency=0.099441 ms
 [NEW] Throughput 3375.47 MB/sec  8 clients  8 procs  max_latency=0.103 ms avg_latency=0.064176 ms


Looking at latency, the new code consistently performs worse at the
top end for 256 clients. Admittedly at that point the machine is
pretty overloaded. Things are much better at the lower end.

One thing I haven't yet done is twiddled the bits individually to see
what the best combination is. Have you settled on the right settings
yet?