lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 02 May 2016 16:50:04 +0200
From:	Mike Galbraith <mgalbraith@...e.de>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Chris Mason <clm@...com>, Ingo Molnar <mingo@...nel.org>,
	Matt Fleming <matt@...eblueprint.co.uk>,
	linux-kernel@...r.kernel.org
Subject: Re: sched: tweak select_idle_sibling to look for idle threads

On Mon, 2016-05-02 at 10:46 +0200, Peter Zijlstra wrote:
> On Sun, May 01, 2016 at 09:12:33AM +0200, Mike Galbraith wrote:
> 
> > Nah, tbench is just variance prone.  It got dinged up at clients=cores
> > on my desktop box, on 4 sockets the high end got seriously dinged up.
> 
> 
> Ha!, check this:
> 
> root@...-ep:~# echo OLD_IDLE > /debug/sched_features ; echo
> NO_ORDER_IDLE > /debug/sched_features ; echo IDLE_CORE >
> /debug/sched_features ; echo NO_FORCE_CORE > /debug/sched_features ;
> tbench 20 -t 10
> 
> Throughput 5956.32 MB/sec  20 clients  20 procs  max_latency=0.126 ms
> 
> 
> root@...-ep:~# echo OLD_IDLE > /debug/sched_features ; echo ORDER_IDLE >
> /debug/sched_features ; echo IDLE_CORE > /debug/sched_features ; echo
> NO_FORCE_CORE > /debug/sched_features ; tbench 20 -t 10
> 
> Throughput 5011.86 MB/sec  20 clients  20 procs  max_latency=0.116 ms
> 
> 
> 
> That little ORDER_IDLE thing hurts silly. That's a little patch I had
> lying about because some people complained that tasks hop around the
> cache domain, instead of being stuck to a CPU.
> 
> I suspect what happens is that by all CPUs starting to look for idle at
> the same place (the first cpu in the domain) they all find the same idle
> cpu and things pile up.
> 
> The old behaviour, where they all start iterating from where they were
> avoids some of that, at the cost of making tasks hop around.
> 
> Lets see if I can get the same behaviour out of the cpumask iteration
> code..

Order is one thing, but what the old behavior does first and foremost
is when the box starts getting really busy, only looking at target's
sibling shuts select_idle_sibling() down instead of letting it wreck
things.  Once cores are moving, there are no large piles of anything
left to collect other than pain.

We really need a good way to know we're not gonna turn the box into a
shredder.  The wake_wide() thing might help some, likely wants some
twiddling, in_interrupt() might be another time to try hard.

Anyway, the has_idle_cores business seems to shut select_idle_sibling()
down rather nicely when the the box gets busy.  Forcing either core,
target's sibling or go fish turned in a top end win on 48 rq/socket.

Oh btw, did you know single socket boxen have no sd_busy?  That doesn't
look right.

fromm:~/:[0]# for i in 1 2 4 8 16 32 64 128 256; do tbench.sh $i 30 2>&1| grep Throughput; done
Throughput 511.016 MB/sec  1 clients  1 procs  max_latency=0.113 ms
Throughput 1042.03 MB/sec  2 clients  2 procs  max_latency=0.098 ms
Throughput 1953.12 MB/sec  4 clients  4 procs  max_latency=0.236 ms
Throughput 3694.99 MB/sec  8 clients  8 procs  max_latency=0.308 ms
Throughput 7080.95 MB/sec  16 clients  16 procs  max_latency=0.442 ms
Throughput 13444.7 MB/sec  32 clients  32 procs  max_latency=1.417 ms
Throughput 20191.3 MB/sec  64 clients  64 procs  max_latency=4.554 ms
Throughput 41115.4 MB/sec  128 clients  128 procs  max_latency=13.414 ms
Throughput 66844.4 MB/sec  256 clients  256 procs  max_latency=50.069 ms

5226         /*
5227          * If there are idle cores to be had, go find one.
5228          */
5229         if (sched_feat(IDLE_CORE) && test_idle_cores(target)) {
5230                 i = select_idle_core(p, target);
5231                 if ((unsigned)i < nr_cpumask_bits)
5232                         return i;
5233  
5234                 /*
5235                  * Failed to find an idle core; stop looking for one.
5236                  */
5237                 clear_idle_cores(target);
5238         }
5239 #if 1
5240         for_each_cpu(i, cpu_smt_mask(target)) {
5241                 if (idle_cpu(i))
5242                         return i;
5243         }
5244  
5245         return target;
5246 #endif
5247  
5248         if (sched_feat(FORCE_CORE)) {

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ