linux-kernel - Re: [PATCH 1/2 v2] sched: fix find_idlest

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtAo1moOvR7a=rXA=_EEbs3vLp+97PBMTS61xG=u_pnMRg@mail.gmail.com>
Date:   Tue, 29 Nov 2016 14:04:27 +0100
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Morten Rasmussen <morten.rasmussen@....com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Matt Fleming <matt@...eblueprint.co.uk>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Wanpeng Li <kernellwp@...il.com>,
        Yuyang Du <yuyang.du@...el.com>,
        Mike Galbraith <umgwanakikbuti@...il.com>
Subject: Re: [PATCH 1/2 v2] sched: fix find_idlest_group for fork

On 29 November 2016 at 11:57, Morten Rasmussen <morten.rasmussen@....com> wrote:
> On Fri, Nov 25, 2016 at 04:34:32PM +0100, Vincent Guittot wrote:
>> During fork, the utilization of a task is init once the rq has been
>> selected because the current utilization level of the rq is used to set
>> the utilization of the fork task. As the task's utilization is still
>> null at this step of the fork sequence, it doesn't make sense to look for
>> some spare capacity that can fit the task's utilization.
>> Furthermore, I can see perf regressions for the test "hackbench -P -g 1"
>> because the least loaded policy is always bypassed and tasks are not
>> spread during fork.
>
> Agreed, the late initialization of util_avg doesn't work very well with
> the spare capacity checking.
>
>> With this patch and the fix below, we are back to same performances as
>> for v4.8. The fix below is only a temporary one used for the test until a
>> smarter solution is found because we can't simply remove the test which is
>> useful for others benchmarks
>>
>> @@ -5708,13 +5708,6 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
>>
>>       avg_cost = this_sd->avg_scan_cost;
>>
>> -     /*
>> -      * Due to large variance we need a large fuzz factor; hackbench in
>> -      * particularly is sensitive here.
>> -      */
>> -     if ((avg_idle / 512) < avg_cost)
>> -             return -1;
>> -
>>       time = local_clock();
>>
>>       for_each_cpu_wrap(cpu, sched_domain_span(sd), target, wrap) {
>
> I don't quite get this fix, but it is very likely because I haven't paid
> enough attention.
>
> Are you saying that removing the avg_cost check is improving hackbench
> performance? I thought it was supposed to help hackbench? I'm confused
> :-(

Yes, avg_cost check prevents some tasks migration at the end of the
tests when some threads have already finished their loop letting some
CPUs idle whereas others threads are still competing on the same CPUS

>
>>
>> Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
>> ---
>>  kernel/sched/fair.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index aa47589..820a787 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5463,13 +5463,19 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p,
>>        * utilized systems if we require spare_capacity > task_util(p),
>>        * so we allow for some task stuffing by using
>>        * spare_capacity > task_util(p)/2.
>> +      * spare capacity can't be used for fork because the utilization has
>> +      * not been set yet as it need to get a rq to init the utilization
>>        */
>> +     if (sd_flag & SD_BALANCE_FORK)
>> +             goto no_spare;
>> +
>>       if (this_spare > task_util(p) / 2 &&
>>           imbalance*this_spare > 100*most_spare)
>>               return NULL;
>>       else if (most_spare > task_util(p) / 2)
>>               return most_spare_sg;
>>
>> +no_spare:
>>       if (!idlest || 100*this_load < imbalance*min_load)
>>               return NULL;
>>       return idlest;
>
> Looks okay to me. We are returning to use load, which is initialized,
> for fork decisions.
>
> Should we do the same for SD_BALANCE_EXEC?

I asked myself if i should add SD_BALANCE_EXEC but decided to only
keep SD_BALANCE_FORK for now as no regression has been raised for now.

>
> An alternative fix would be to move the utilization initialization
> before we pick the cpu, but that opens the whole discussion about what
> we should initialize it to again. So I'm fine with not going there now.
>
> Morten