linux-kernel - Re: [PATCH 2/3] sched/fair: Introduce scaled capacity awareness in select_idle

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <0e4151ea-4952-8384-4877-ce149577f2ef@oracle.com>
Date:   Tue, 3 Oct 2017 17:21:30 -0700
From:   Rohit Jain <rohit.k.jain@...cle.com>
To:     Joel Fernandes <joelaf@...gle.com>
Cc:     LKML <linux-kernel@...r.kernel.org>, eas-dev@...ts.linaro.org,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Atish Patra <atish.patra@...cle.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Morten Rasmussen <morten.rasmussen@....com>
Subject: Re: [PATCH 2/3] sched/fair: Introduce scaled capacity awareness in
 select_idle_sibling code path

Hi Joel,


On 10/02/2017 09:52 PM, Joel Fernandes wrote:
> Hi Rohit,
>
> On Thu, Sep 28, 2017 at 8:09 AM, Rohit Jain <rohit.k.jain@...cle.com> wrote:
> [..]
>>>> With this case, because we know from the past avg, one of the strands is
>>>> running low on capacity, I am trying to return a better strand for the
>>>> thread to start on.
>>>>
>>> I know what you're trying to do but they way you've retrofitted it into
>>> the
>>> core looks weird (to me) and makes the code unreadable and ugly IMO.
>>>
>>> Why not do something simpler like skip the core if any SMT thread has been
>>> running at lesser capacity? I'm not sure if this works great or if the
>>> maintainers
>>> will prefer your or my below approach, but I find the below diff much
>>> cleaner
>>> for the select_idle_core bit. It also makes more sense since resources are
>>> shared at SMT level so makes sense to me to skip the core altogether for
>>> this:
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 6ee7242dbe0a..f324a84e29f1 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -5738,14 +5738,17 @@ static int select_idle_core(struct task_struct *p,
>>> struct sched_domain *sd, int
>>>          for_each_cpu_wrap(core, cpus, target) {
>>>                  bool idle = true;
>>> +               bool full_cap = true;
>>>                  for_each_cpu(cpu, cpu_smt_mask(core)) {
>>>                          cpumask_clear_cpu(cpu, cpus);
>>>                          if (!idle_cpu(cpu))
>>>                                  idle = false;
>>> +                       if (!full_capacity(cpu))
>>> +                               full_cap = false;
>>>                  }
>>>    -             if (idle)
>>> +               if (idle && full_cap)
>>>                          return core;
>>>          }
>>>
>>
>>
>> Well, with your changes you will skip over fully idle cores which is not
>> an ideal thing either. I see that you were advocating for select
>> idle+lowest capacity core, whereas I was stopping at the first idlecore.
>>
>> Since the whole philosophy till now in this patch is "Don't spare an
>> idle CPU", I think the following diff might look better to you. Please
>> note this is only for discussion sakes, I haven't fully tested it yet.
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index ec15e5f..c2933eb 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -6040,7 +6040,9 @@ void __update_idle_core(struct rq *rq)
>>   static int select_idle_core(struct task_struct *p, struct sched_domain *sd,
>> int target)
>>   {
>>       struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
>> -    int core, cpu;
>> +    int core, cpu, rcpu, backup_core;
>> +
>> +    rcpu = backup_core = -1;
>>
>>       if (!static_branch_likely(&sched_smt_present))
>>           return -1;
>> @@ -6052,15 +6054,34 @@ static int select_idle_core(struct task_struct *p,
>> struct sched_domain *sd, int
>>
>>       for_each_cpu_wrap(core, cpus, target) {
>>           bool idle = true;
>> +        bool full_cap = true;
>>
>>           for_each_cpu(cpu, cpu_smt_mask(core)) {
>>               cpumask_clear_cpu(cpu, cpus);
>>               if (!idle_cpu(cpu))
>>                   idle = false;
>> +
>> +            if (!full_capacity(cpu)) {
>> +                full_cap = false;
>> +            }
>>           }
>>
>> -        if (idle)
>> +        if (idle && full_cap)
>>               return core;
>> +        else if (idle && backup_core == -1)
>> +            backup_core = core;
>> +    }
>> +
>> +    if (backup_core != -1) {
>> +        for_each_cpu(cpu, cpu_smt_mask(backup_core)) {
>> +            if (full_capacity(cpu))
>> +                return cpu;
>> +            else if ((rcpu == -1) ||
>> +                 (capacity_of(cpu) > capacity_of(rcpu)))
>> +                rcpu = cpu;
>> +        }
>> +
>> +        return rcpu;
>>       }
>>
>>
>> Do let me know what you think.
> I think that if there isn't a benefit in your tests in doing the above
> vs the simpler approach, then I prefer the simpler approach especially
> since there's no point/benefit in complicating the code for
> select_idle_core.

Fair enough!

If there are no more concerns in this version, then I will go ahead and
try out all that is discussed in this version and send an updated
version. Please let me know if there are any other concerns/feedback.

Thanks,
Rohit

>
> thanks,
>
> - Joel