[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtBx2ZASN4SHQmnA=_E_TaQ6p1Nq-Xc9NHRGS==rOpSgLw@mail.gmail.com>
Date: Mon, 3 Oct 2016 15:05:45 +0200
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Dietmar Eggemann <dietmar.eggemann@....com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Matt Fleming <matt@...eblueprint.co.uk>,
Ingo Molnar <mingo@...nel.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
Mike Galbraith <umgwanakikbuti@...il.com>,
Yuyang Du <yuyang.du@...el.com>
Subject: Re: [PATCH] sched/fair: Do not decay new task load on first enqueue
On 29 September 2016 at 18:15, Dietmar Eggemann
<dietmar.eggemann@....com> wrote:
> On 28/09/16 14:13, Vincent Guittot wrote:
>> Le Wednesday 28 Sep 2016 à 05:27:54 (-0700), Vincent Guittot a écrit :
>>> On 28 September 2016 at 04:31, Dietmar Eggemann
>>> <dietmar.eggemann@....com> wrote:
>>>> On 28/09/16 12:19, Peter Zijlstra wrote:
>>>>> On Wed, Sep 28, 2016 at 12:06:43PM +0100, Dietmar Eggemann wrote:
>>>>>> On 28/09/16 11:14, Peter Zijlstra wrote:
>>>>>>> On Fri, Sep 23, 2016 at 12:58:08PM +0100, Matt Fleming wrote:
>
> [...]
>
>> IIUC the problem raised by Matt, he see a regression because we now remove
>> during the dequeue the exact same load as during the enqueue so
>> cfs_rq->runnable_load_avg is null so we select a cfs_rq that might already have
>> a lot of hackbench blocked thread.
>
> This is my understanding as well.
>
>> The fact that runnable_load_avg is null, when the cfs_rq doesn't have runnable
>> task, is quite correct and we should keep it. But when we look for the idlest
>> group, we have to take into account the blocked thread.
>>
>> That's what i have tried to do below
>
> [...]
>
>> + /*
>> + * In case that we have same runnable load (especially null
>> + * runnable load), we select the group with smallest blocked
>> + * load
>> + */
>> + min_avg_load = avg_load;
>> + min_runnable_load = runnable_load;
>
> Setting 'min_runnable_load' wouldn't be necessary here.
fair enough
>
>> idlest = group;
>> }
>> +
>> } while (group = group->next, group != sd->groups);
>>
>> - if (!idlest || 100*this_load < imbalance*min_load)
>> + if (!idlest || 100*this_load < imbalance*min_runnable_load)
>> return NULL;
>> return idlest;
>
> On the Hikey board (ARM64) (2 cluster, each 4 cpu's, so MC and DIE), the
> first f_i_g (on DIE) is still based on rbl_load. So if the first
> hackbench task (spawning all the worker task) runs on cluster1, and the
> former worker p_X already blocks f_i_g returns cluster2, if p_X still
> runs, it returns idlest=NULL and we continue with cluster1 for second
> f_i_g on MC.
>
> The additional 'else if' condition doesn't seem to help much because of
> occurrences where an idle cpu (which never took a worker) still has a
> small value of rbl_load (shouldn't actually happen, weighted_cpuload()
> should be 0) so it is never chosen or it has even a negative impact in
> the case where an idle cpu (which never took a worker) is not chosen
> because its load (cfs->avg.load_avg) hasn't been updated for a long time
> so another cpu with rbl_load = 0 and a smaller load is used (even though
> a lot of worker where already placed on it).
So the elseif part is there to take care of the regression raised by
Matt where the runnable_load_avg is null because worker are blocked
and the same cpu is selected
This can be extended with a threshold in order to include small
differences that came from computation rounding
>
> There are also episodes where we 'pack' workers onto the cpu which is
> initially picked in f_i_c (on DIE) because (100*this_load <
> imbalance*min_load) is true in f_i_g on MC. Maybe we can get rid of this
> for !sd->child ?
This threshold is there to filter any small variations that are not
relevant. I'm going to extend the use of cfs_rq_load_avg() in all
conditions so we take into account blocked load everywhere instead of
only when runnable_load_avg is null
>
> [...]
Powered by blists - more mailing lists