[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtABaxDssuQ4or59mf6xcVBvFc4R+uDzn+1ESRSYKTvb1g@mail.gmail.com>
Date: Tue, 18 Oct 2016 17:19:22 +0200
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Matt Fleming <matt@...eblueprint.co.uk>,
Wanpeng Li <kernellwp@...il.com>,
Ingo Molnar <mingo@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Mike Galbraith <umgwanakikbuti@...il.com>,
Yuyang Du <yuyang.du@...el.com>,
Dietmar Eggemann <dietmar.eggemann@....com>
Subject: Re: [PATCH] sched/fair: Do not decay new task load on first enqueue
On 18 October 2016 at 13:09, Peter Zijlstra <peterz@...radead.org> wrote:
> On Wed, Oct 12, 2016 at 09:41:36AM +0200, Vincent Guittot wrote:
>
>> ok. In fact, I have noticed another regression with tip/sched/core and
>> hackbench while looking at yours.
>> I have bisect to :
>> 10e2f1acd0 ("sched/core: Rewrite and improve select_idle_siblings")
>>
>> hackbench -P -g 1
>>
>> v4.8 tip/sched/core tip/sched/core+revert 10e2f1acd010
>> and 1b568f0aabf2
>> min 0.051 0,052 0.049
>> avg 0.057(0%) 0,062(-7%) 0.056(+1%)
>> max 0.070 0,073 0.067
>> stdev +/-8% +/-10% +/-9%
>>
>> The issue seems to be that it prevents some migration at wake up at
>> the end of hackbench test so we have last tasks that compete for the
>> same CPU whereas other CPUs are idle in the same MC domain. I haven't
>> to look more deeply which part of the patch do the regression yet
>
> So select_idle_cpu(), which does the LLC wide CPU scan is now throttled
> by a comparison between avg_cost and avg_idle; where avg_cost is a
> historical measure of how costly it was to scan the entire LLC domain
> and avg_idle is our current idle time guestimate (also a historical
> average).
>
> The problem was that a number of workloads were spending quite a lot of
> time here scanning CPUs while they could be doing useful work (esp.
> since newer parts have silly amounts of CPUs per LLC).
make sense
>
> The toggle is a heuristic with a random number in.. we could see if
> there's anything better we can do. I know some people take the toggle
> out entirely, but that will regress other workloads.
ok. so removing the toggle fixes the problem in this test case too.
may be we can take into account the sd_llc_size in the toggle
Powered by blists - more mailing lists