linux-kernel - Re: [PATCH 2/3] sched/fair: Introduce scaled capacity awareness in select_idle

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Mon, 2 Oct 2017 21:52:50 -0700
From:   Joel Fernandes <joelaf@...gle.com>
To:     Rohit Jain <rohit.k.jain@...cle.com>
Cc:     LKML <linux-kernel@...r.kernel.org>, eas-dev@...ts.linaro.org,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Atish Patra <atish.patra@...cle.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Morten Rasmussen <morten.rasmussen@....com>
Subject: Re: [PATCH 2/3] sched/fair: Introduce scaled capacity awareness in
 select_idle_sibling code path

Hi Rohit,

On Thu, Sep 28, 2017 at 8:09 AM, Rohit Jain <rohit.k.jain@...cle.com> wrote:
[..]
>>>
>>> With this case, because we know from the past avg, one of the strands is
>>> running low on capacity, I am trying to return a better strand for the
>>> thread to start on.
>>>
>> I know what you're trying to do but they way you've retrofitted it into
>> the
>> core looks weird (to me) and makes the code unreadable and ugly IMO.
>>
>> Why not do something simpler like skip the core if any SMT thread has been
>> running at lesser capacity? I'm not sure if this works great or if the
>> maintainers
>> will prefer your or my below approach, but I find the below diff much
>> cleaner
>> for the select_idle_core bit. It also makes more sense since resources are
>> shared at SMT level so makes sense to me to skip the core altogether for
>> this:
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 6ee7242dbe0a..f324a84e29f1 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5738,14 +5738,17 @@ static int select_idle_core(struct task_struct *p,
>> struct sched_domain *sd, int
>>         for_each_cpu_wrap(core, cpus, target) {
>>                 bool idle = true;
>> +               bool full_cap = true;
>>                 for_each_cpu(cpu, cpu_smt_mask(core)) {
>>                         cpumask_clear_cpu(cpu, cpus);
>>                         if (!idle_cpu(cpu))
>>                                 idle = false;
>> +                       if (!full_capacity(cpu))
>> +                               full_cap = false;
>>                 }
>>   -             if (idle)
>> +               if (idle && full_cap)
>>                         return core;
>>         }
>>
>
>
>
> Well, with your changes you will skip over fully idle cores which is not
> an ideal thing either. I see that you were advocating for select
> idle+lowest capacity core, whereas I was stopping at the first idlecore.
>
> Since the whole philosophy till now in this patch is "Don't spare an
> idle CPU", I think the following diff might look better to you. Please
> note this is only for discussion sakes, I haven't fully tested it yet.
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ec15e5f..c2933eb 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6040,7 +6040,9 @@ void __update_idle_core(struct rq *rq)
>  static int select_idle_core(struct task_struct *p, struct sched_domain *sd,
> int target)
>  {
>      struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
> -    int core, cpu;
> +    int core, cpu, rcpu, backup_core;
> +
> +    rcpu = backup_core = -1;
>
>      if (!static_branch_likely(&sched_smt_present))
>          return -1;
> @@ -6052,15 +6054,34 @@ static int select_idle_core(struct task_struct *p,
> struct sched_domain *sd, int
>
>      for_each_cpu_wrap(core, cpus, target) {
>          bool idle = true;
> +        bool full_cap = true;
>
>          for_each_cpu(cpu, cpu_smt_mask(core)) {
>              cpumask_clear_cpu(cpu, cpus);
>              if (!idle_cpu(cpu))
>                  idle = false;
> +
> +            if (!full_capacity(cpu)) {
> +                full_cap = false;
> +            }
>          }
>
> -        if (idle)
> +        if (idle && full_cap)
>              return core;
> +        else if (idle && backup_core == -1)
> +            backup_core = core;
> +    }
> +
> +    if (backup_core != -1) {
> +        for_each_cpu(cpu, cpu_smt_mask(backup_core)) {
> +            if (full_capacity(cpu))
> +                return cpu;
> +            else if ((rcpu == -1) ||
> +                 (capacity_of(cpu) > capacity_of(rcpu)))
> +                rcpu = cpu;
> +        }
> +
> +        return rcpu;
>      }
>
>
> Do let me know what you think.

I think that if there isn't a benefit in your tests in doing the above
vs the simpler approach, then I prefer the simpler approach especially
since there's no point/benefit in complicating the code for
select_idle_core.

thanks,

- Joel