linux-kernel - Re: [PATCH RESEND] sched: prefer an idle cpu vs an idle sibling for BALANCE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1435905658.6418.52.camel@gmail.com>
Date:	Fri, 03 Jul 2015 08:40:58 +0200
From:	Mike Galbraith <umgwanakikbuti@...il.com>
To:	Josef Bacik <jbacik@...com>
Cc:	Peter Zijlstra <peterz@...radead.org>, riel@...hat.com,
	mingo@...hat.com, linux-kernel@...r.kernel.org,
	morten.rasmussen@....com, kernel-team <Kernel-team@...com>
Subject: Re: [PATCH RESEND] sched: prefer an idle cpu vs an idle sibling for
 BALANCE_WAKE

On Thu, 2015-07-02 at 13:44 -0400, Josef Bacik wrote:

> Now for 3.10 vs 4.0 our request duration time is the same if not 
> slightly better on 4.0, so once the workers are doing their job 
> everything is a-ok.
> 
> The problem is the probability the select queue >= 1 is way different on 
> 4.0 vs 3.10.  Normally this graph looks like an S, it's essentially 0 up 
> to some RPS (requests per second) threshold and then shoots up to 100% 
> after the threshold.  I'll make a table of these graphs that hopefully 
> makes sense, the numbers are different from run to run because of 
> traffic and such, the test and control are both run at the same time. 
> The header is the probability the select queue >=1
> 
> 		25%	50%	75%
> 4.0 plain: 	371	388	402
> control:	386	394	402
> difference:	15	6	0

So control is 3.10?  Virgin?

> So with 4.0 its basically a straight line, at lower RPS we are getting a 
> higher probability of a select queue >= 1.  We are measuring the cpu 
> delay avg ms thing from the scheduler netlink stuff which is how I 
> noticed it was scheduler related, our cpu delay is way higher on 4.0 
> than it is on 3.10 or 4.0 with the wake idle patch.
> 
> So the next test is NO_PREFER_IDLE.  This is slightly better than 4.0 plain
> 		25%	50%	75%
> NO_PREFER_IDLE:	399	401	414
> control:	385	408	416
> difference:	14	7	2

Hm.  Throttling nohz may make larger delta.  But never mind that.

> The numbers don't really show it well, but the graphs are closer 
> together, it's slightly more s shaped, but still not great.
> 
> Next is NO_WAKE_WIDE, which is horrible
> 
> 		25%	50%	75%
> NO_WAKE_WIDE:	315	344	369
> control:	373	380	388
> difference:	58	36	19
> 
> This isn't even in the same ballpark, it's a way worse regression than 
> plain.

Ok, this jibes perfectly with 1:N waker/wakee thingy.

> The next bit is NO_WAKE_WIDE|NO_PREFER_IDLE, which is just as bad
> 
> 		25%	50%	75%
> EVERYTHING:	327	360	383
> control:	381	390	399
> difference:	54	30	19

Ditto.

Hm.  Seems what this load should like best is if we detect 1:N, skip all
of the routine gyrations, ie move the N (workers) infrequently, expend
search cycles frequently only on the 1 (dispatch).

Ponder..

	-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/