linux-kernel - Re: [RFC][PATCH] sched: Avoid select_idle_sibling() for wake

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1380179397.7525.45.camel@marge.simpson.net>
Date:	Thu, 26 Sep 2013 09:09:57 +0200
From:	Mike Galbraith <bitbucket@...ine.de>
To:	Michael wang <wangyun@...ux.vnet.ibm.com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>, Paul Turner <pjt@...gle.com>,
	Rik van Riel <riel@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: [RFC][PATCH] sched: Avoid select_idle_sibling() for
 wake_affine(.sync=true)

On Thu, 2013-09-26 at 14:32 +0800, Michael wang wrote: 
> On 09/26/2013 01:34 PM, Mike Galbraith wrote:
> > On Thu, 2013-09-26 at 13:12 +0800, Michael wang wrote: 
> >> On 09/26/2013 11:41 AM, Mike Galbraith wrote:
> >> [snip]
> >>>> Like the case when we have:
> >>>>
> >>>> 	core0 sg		core1 sg
> >>>> 	cpu0	cpu1		cpu2	cpu3
> >>>> 	waker	busy		idle	idle
> >>>>
> >>>> If the sync wakeup was on cpu0, we can:
> >>>>
> >>>> 1. choose cpu in core1 sg like we did usually
> >>>>    some overhead but tend to make the load a little balance
> >>>> 	core0 sg		core1 sg
> >>>> 	cpu0	cpu1		cpu2	cpu3
> >>>> 	idle	busy		wakee	idle
> >>>
> >>> Reducing latency and increasing throughput when the waker isn't really
> >>> really going to immediately schedule off as the hint implies.  Nice for
> >>> bursty loads and ramp.
> >>>
> >>> The breakeven point is going up though.  If you don't have nohz
> >>> throttled, you eat tick start/stop overhead, and the menu governor
> >>> recently added yet more overhead, so maybe we should say hell with it.
> >>
> >> Exactly, more and more factors to be considered, we say things get
> >> balanced but actually it's not the best choice...
> >>
> >>>
> >>>> 2. choose cpu0 like the patch proposed
> >>>>    no overhead but tend to make the load a little more unbalance
> >>>> 	core0 sg		core1 sg
> >>>> 	cpu0	cpu1		cpu2	cpu3
> >>>> 	wakee	busy		idle	idle
> >>>>
> >>>> May be we should add a higher scope load balance check in wake_affine(),
> >>>> but that means higher overhead which is just what the patch want to
> >>>> reduce...
> >>>
> >>> Yeah, more overhead is the last thing we need.
> >>>
> >>>> What about some discount for sync case inside select_idle_sibling()?
> >>>> For example we consider sync cpu as idle and prefer it more than the others?
> >>>
> >>> That's what the sync hint does.  Problem is, it's a hint.  If it were
> >>> truth, there would be no point in calling select_idle_sibling().
> >>
> >> Just wondering if the hint was wrong in most of the time, then why don't
> >> we remove it...
> > 
> > For very fast/light network ping-pong micro-benchmarks, it is right.
> > For pipe-test, it's absolutely right, jabbering parties are 100%
> > synchronous, there is nada/nil/zip/diddly squat overlap reclaimable..
> > but in the real world, it ain't necessarily so.
> > 
> >> Otherwise I think we can still utilize it to make some decision tends to
> >> be correct, don't we?
> > 
> > Sometimes :)
> 
> Ok, a double-edged sword I see :)
> 
> May be we can wave it carefully here, give the discount to a bigger
> scope not the sync cpu, for example:
> 
> 	sg1				sg2
> 	cpu0	cpu1	cpu2	cpu3	cpu4	cpu5	cpu6	cpu7
> 	waker	idle	idle	idle	idle	idle	idle	idle
> 
> If it's sync wakeup on cpu0 (only waker), and the sg is wide enough,
> which means one cpu is not so influencial, then suppose cpu0 to be idle
> could be more safe, also prefer sg1 than sg2 is more likely to be right.
> 
> And we can still choose idle-cpu at final step, like cpu1 in this case,
> to avoid the risk that waker don't get off as it said.
> 
> The key point is to reduce the influence of sync, trust a little but not
> totally ;-)

What we need is a dirt cheap way to fairly accurately predict overlap
potential (todo: write omniscience().. patent, buy planet).

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/