linux-kernel - Re: [PATCH 2/3] sched/fair: Introduce scaled capacity awareness in select_idle

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ffb62cdf-570c-227b-9390-06af864b6730@oracle.com>
Date:   Tue, 26 Sep 2017 12:48:25 -0700
From:   Rohit Jain <rohit.k.jain@...cle.com>
To:     Joel Fernandes <joelaf@...gle.com>
Cc:     LKML <linux-kernel@...r.kernel.org>, eas-dev@...ts.linaro.org,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Atish Patra <atish.patra@...cle.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Morten Rasmussen <morten.rasmussen@....com>
Subject: Re: [PATCH 2/3] sched/fair: Introduce scaled capacity awareness in
 select_idle_sibling code path

On 09/25/2017 11:53 PM, Joel Fernandes wrote:
> Hi Rohit,
>
> Just some comments:

Hi Joel,

Thanks for the comments.

> On Mon, Sep 25, 2017 at 5:02 PM, Rohit Jain <rohit.k.jain@...cle.com> wrote:
>> While looking for CPUs to place running tasks on, the scheduler
>> completely ignores the capacity stolen away by RT/IRQ tasks.
>>
>> This patch fixes that.
>>
>> Signed-off-by: Rohit Jain <rohit.k.jain@...cle.com>
>> ---
>>   kernel/sched/fair.c | 54 ++++++++++++++++++++++++++++++++++++++++++-----------
>>   1 file changed, 43 insertions(+), 11 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index afb701f..19ff2c3 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -6040,7 +6040,10 @@ void __update_idle_core(struct rq *rq)
>>   static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
>>   {
>>          struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
>> -       int core, cpu;
>> +       int core, cpu, rcpu, rcpu_backup;
> I would call rcpu_backup as backup_cpu.

OK

>
>> +       unsigned int backup_cap = 0;
>> +
>> +       rcpu = rcpu_backup = -1;
>>
>>          if (!static_branch_likely(&sched_smt_present))
>>                  return -1;
>> @@ -6057,10 +6060,20 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
>>                          cpumask_clear_cpu(cpu, cpus);
>>                          if (!idle_cpu(cpu))
>>                                  idle = false;
>> +
>> +                       if (full_capacity(cpu)) {
>> +                               rcpu = cpu;
>> +                       } else if ((rcpu == -1) && (capacity_of(cpu) > backup_cap)) {
>> +                               backup_cap = capacity_of(cpu);
>> +                               rcpu_backup = cpu;
>> +                       }
> Here you comparing capacity of different SMT threads.
>
>>                  }
>>
>> -               if (idle)
>> -                       return core;
>> +               if (idle) {
>> +                       if (rcpu == -1)
>> +                               return (rcpu_backup != -1 ? rcpu_backup : core);
>> +                       return rcpu;
>> +               }
>
> This didn't make much sense to me, here you are returning either an
> SMT thread or a core. That doesn't make much of a difference because
> SMT threads share the same capacity (SD_SHARE_CPUCAPACITY). I think
> what you want to do is find out the capacity of a 'core', not an SMT
> thread, and compare the capacity of different cores and consider the
> one which has least RT/IRQ interference.

IIUC the capacities of each strand is scaled by IRQ and 'rt_avg' for that
'rq'. Now if the strand is idle now and gets an interrupt in the future,
the 'core' would look like:

    +----+----+
    | I  |    |
    | T  |    |
    +----+----+

(I -> Interrupt, T-> Thread we are trying to schedule).

whereas if the other strand on the core was taking interrupt the core
would look like:

    +----+----+
    | I  | T  |
    |    |    |
    +----+----+

With this case, because we know from the past avg, one of the strands is
running low on capacity, I am trying to return a better strand for the
thread to start on.

>
>>          }
>>
>>          /*
>> @@ -6076,7 +6089,8 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
>>    */
>>   static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
>>   {
>> -       int cpu;
>> +       int cpu, backup_cpu = -1;
>> +       unsigned int backup_cap = 0;
>>
>>          if (!static_branch_likely(&sched_smt_present))
>>                  return -1;
>> @@ -6084,11 +6098,17 @@ static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int t
>>          for_each_cpu(cpu, cpu_smt_mask(target)) {
>>                  if (!cpumask_test_cpu(cpu, &p->cpus_allowed))
>>                          continue;
>> -               if (idle_cpu(cpu))
>> -                       return cpu;
>> +               if (idle_cpu(cpu)) {
>> +                       if (full_capacity(cpu))
>> +                               return cpu;
>> +                       if (capacity_of(cpu) > backup_cap) {
>> +                               backup_cap = capacity_of(cpu);
>> +                               backup_cpu = cpu;
>> +                       }
>> +               }
> Same thing here, since SMT threads share the same underlying capacity,
> is there any point in comparing the capacities of each SMT thread?

See above

Thanks,
Rohit

>
> thanks,
>
> - Joel
>
> [...]