linux-kernel - Re: [PATCH 1/3] sched/fair: Introduce scaled capacity awareness in find_idlest

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <c550be7f-4f54-d78d-2ea5-9897e3365495@oracle.com>
Date:   Thu, 12 Oct 2017 18:54:02 -0700
From:   Rohit Jain <rohit.k.jain@...cle.com>
To:     Joel Fernandes <joelaf@...gle.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Atish Patra <atish.patra@...cle.com>,
        LKML <linux-kernel@...r.kernel.org>, eas-dev@...ts.linaro.org,
        Ingo Molnar <mingo@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Morten Rasmussen <morten.rasmussen@....com>
Subject: Re: [PATCH 1/3] sched/fair: Introduce scaled capacity awareness in
 find_idlest_cpu code path

Hi Joel,


On 10/12/2017 02:47 PM, Joel Fernandes wrote:
> On Thu, Oct 12, 2017 at 10:03 AM, Rohit Jain <rohit.k.jain@...cle.com> wrote:
>> Hi Joel, Atish,
>>
>> Moving off-line discussions to LKML, just so everyone's on the same page,
>> I actually like this version now and it is outperforming my previous
>> code, so I am on board with this version. It makes the code simpler too.
> I think you should have explained what the version does differently.
> Nobody can read your mind.

I apologize for being terse (will do better next time)

This is based on your (offline) suggestion (and rightly so), that
find_idlest_group today bases its decision on capacity_spare_wake which
in turn only looks at the original capacity of the CPU. This diff
(version) changes that to look at the current capacity after being
scaled down (due to IRQ/RT/etc.).

Also, this diff changed find_idlest_group_cpu to not do a search for
CPUs based on the 'full_capacity()' function, instead changed it to
find the idlest CPU with max available capacity. This way we can avoid
all the 'backup' stuff in the code as in the version (v5) below it.

I think as you can see from the way it will work itself out that the
code will look much simpler with the new search. This is OK because we
are doing a full CPU search in the sched_group_span anyway.

[..]
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 56f343b..a1f622c 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5724,7 +5724,7 @@ static int cpu_util_wake(int cpu, struct task_struct
>> *p);
>>
>>   static unsigned long capacity_spare_wake(int cpu, struct task_struct *p)
>>   {
>> -    return capacity_orig_of(cpu) - cpu_util_wake(cpu, p);
>> +    return capacity_of(cpu) - cpu_util_wake(cpu, p);
>>   }
>>
>>   /*
>> @@ -5870,6 +5870,7 @@ find_idlest_group_cpu(struct sched_group *group,
>> struct task_struct *p, int this
>>       unsigned long load, min_load = ULONG_MAX;
>>       unsigned int min_exit_latency = UINT_MAX;
>>       u64 latest_idle_timestamp = 0;
>> +    unsigned int idle_cpu_cap = 0;
>>       int least_loaded_cpu = this_cpu;
>>       int shallowest_idle_cpu = -1;
>>       int i;
>> @@ -5881,6 +5882,7 @@ find_idlest_group_cpu(struct sched_group *group,
>> struct task_struct *p, int this
>>       /* Traverse only the allowed CPUs */
>>       for_each_cpu_and(i, sched_group_span(group), &p->cpus_allowed) {
>>           if (idle_cpu(i)) {
>> +            int idle_candidate = -1;
>>               struct rq *rq = cpu_rq(i);
>>               struct cpuidle_state *idle = idle_get_state(rq);
>>               if (idle && idle->exit_latency < min_exit_latency) {
>> @@ -5891,7 +5893,7 @@ find_idlest_group_cpu(struct sched_group *group,
>> struct task_struct *p, int this
>>                    */
>>                   min_exit_latency = idle->exit_latency;
>>                   latest_idle_timestamp = rq->idle_stamp;
>> -                shallowest_idle_cpu = i;
>> +                idle_candidate = i;
>>               } else if ((!idle || idle->exit_latency == min_exit_latency) &&
>>                      rq->idle_stamp > latest_idle_timestamp) {
>>                   /*
>> @@ -5900,8 +5902,14 @@ find_idlest_group_cpu(struct sched_group *group,
>> struct task_struct *p, int this
>>                    * a warmer cache.
>>                    */
>>                   latest_idle_timestamp = rq->idle_stamp;
>> -                shallowest_idle_cpu = i;
>> +                idle_candidate = i;
>>               }
>> +
>> +            if (idle_candidate != -1 &&
>> +                (capacity_of(idle_candidate) > idle_cpu_cap)) {
>> +                shallowest_idle_cpu = idle_candidate;
>> +                idle_cpu_cap = capacity_of(idle_candidate);
>> +            }
> This is broken, incase idle_candidate != -1 but idle_cpu_cap makes the
> condition false - you're still setting min_exit_latency which is
> wrong.

Yes, you're right. I will fix this.

>
> Also this means if you have 2 CPUs and 1 is in a shallower idle state
> than the other, but lesser in capacity, then it would select the CPU
> with less shallow idle state right? So 'shallowest_idle_cpu' loses its
> meaning.

OK, I will change the name

Thanks,
Rohit
> [..]