linux-kernel - Re: [PATCH 2/2 v2] sched: use load

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtBMeL9Omvj+KzL2KAhTH8rjz5BzPypaj6DmCXn0ykZpWg@mail.gmail.com>
Date:   Wed, 30 Nov 2016 14:49:11 +0100
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Morten Rasmussen <morten.rasmussen@....com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Matt Fleming <matt@...eblueprint.co.uk>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Wanpeng Li <kernellwp@...il.com>,
        Yuyang Du <yuyang.du@...el.com>,
        Mike Galbraith <umgwanakikbuti@...il.com>
Subject: Re: [PATCH 2/2 v2] sched: use load_avg for selecting idlest group

On 30 November 2016 at 13:49, Morten Rasmussen <morten.rasmussen@....com> wrote:
> On Fri, Nov 25, 2016 at 04:34:33PM +0100, Vincent Guittot wrote:
>> find_idlest_group() only compares the runnable_load_avg when looking for
>> the least loaded group. But on fork intensive use case like hackbench

[snip]

>> +                             min_avg_load = avg_load;
>> +                             idlest = group;
>> +                     } else if ((runnable_load < (min_runnable_load + imbalance)) &&
>> +                                     (100*min_avg_load > imbalance_scale*avg_load)) {
>> +                             /*
>> +                              * The runnable loads are close so we take
>> +                              * into account blocked load through avg_load
>> +                              *  which is blocked + runnable load
>> +                              */
>> +                             min_avg_load = avg_load;
>>                               idlest = group;
>>                       }
>>
>> @@ -5470,13 +5495,16 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p,
>>               goto no_spare;
>>
>>       if (this_spare > task_util(p) / 2 &&
>> -         imbalance*this_spare > 100*most_spare)
>> +         imbalance_scale*this_spare > 100*most_spare)
>>               return NULL;
>>       else if (most_spare > task_util(p) / 2)
>>               return most_spare_sg;
>>
>>  no_spare:
>> -     if (!idlest || 100*this_load < imbalance*min_load)
>> +     if (!idlest ||
>> +         (min_runnable_load > (this_runnable_load + imbalance)) ||
>> +         ((this_runnable_load < (min_runnable_load + imbalance)) &&
>> +                     (100*min_avg_load > imbalance_scale*this_avg_load)))
>
> I don't get why you have imbalance_scale applied to this_avg_load and
> not min_avg_load. IIUC, you end up preferring non-local groups?

In fact, I have keep the same condition that is used when looping the group.
You're right that we should prefer local rq if avg_load are close and
test the condition
(100*this_avg_load > imbalance_scale*min_avg_load) instead

>
> If we take the example where this_runnable_load == min_runnable_load and
> this_avg_load == min_avg_load. In this case, and in cases where
> min_avg_load is slightly bigger than this_avg_load, we end up picking
> the 'idlest' group even if the local group is equally good or even
> slightly better?
>
>>               return NULL;
>>       return idlest;
>>  }
>
> Overall, I like that load_avg is being brought in to make better
> decisions. The variable naming is a bit confusing. For example,
> runnable_load is a capacity-average just like avg_load. 'imbalance' is
> now an absolute capacity-average margin, but it is hard to come up with
> better short alternatives.
>
> Although 'imbalance' is based on the existing imbalance_pct, I find
> somewhat arbitrary. Why is (imbalance_pct-100)*1024/100 a good absolute
> margin to define the interval where we want to consider load_avg? I
> guess it is case of 'we had to pick some value', which we have done in
> many other places. Though, IMHO, it is a bit strange that imbalance_pct
> is used in two different ways to bias comparison in the same function.

I see imbalance_pct like the definition of the acceptable imbalance %
for a sched_domain. This % is then used against the current load or to
define an absolute value.

> It used to be only used as a scaling factor (now imbalance_scale), while
> this patch proposes to use it for computing an absolute margin
> (imbalance) as well. It is not major issue, but it is not clear why it
> is used differently to compare two metrics that are relatively closely
> related.

In fact, scaling factor (imbalance) doesn't work well with small
value. As an example, the use of a scaling factor fails as soon as
this_runnable_load = 0 because we always selected local rq even if
min_runnable_load is only 1  which doesn't really make sense because
they are just the same.

>
> Morten