linux-kernel - Re: [RESEND PATCH] sched/fair: consider RT/IRQ pressure in select_idle

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Tue, 30 Jan 2018 11:47:46 -0800
From:   Rohit Jain <rohit.k.jain@...cle.com>
To:     Joel Fernandes <joelaf@...gle.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        LKML <linux-kernel@...r.kernel.org>, steven.sistare@...cle.com,
        dhaval.giani@...cle.com,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Morten Rasmussen <morten.rasmussen@....com>,
        "Cc: EAS Dev" <eas-dev@...ts.linaro.org>
Subject: Re: [RESEND PATCH] sched/fair: consider RT/IRQ pressure in
 select_idle_sibling

Hi Joel,

On 1/29/2018 7:39 PM, Joel Fernandes wrote:
> Hi Rohit,
>
> On Mon, Jan 29, 2018 at 3:27 PM, Rohit Jain<rohit.k.jain@...cle.com>  wrote:
>> Currently fast path in the scheduler looks for an idle CPU to schedule
>> threads on. Capacity is taken into account in the function
>> 'select_task_rq_fair' when it calls 'wake_cap', however it ignores the
>> instantaneous capacity and looks at the original capacity. Furthermore
>> select_idle_sibling path of the code, ignores the RT/IRQ threads which
>> are also running on the CPUs it is looking to schedule fair threads on.
>>
>> We don't necessarily have to force the code to go to slow path (by
>> modifying wake_cap), instead we could do a better selection of the CPU
>> in the current domain itself (i.e. in the fast path).
>>
>> This patch makes the fast path aware of capacity, resulting in overall
>> performance improvements as shown in the test results.
>>
> [...]
>> I also ran uperf and sysbench MySQL workloads but I see no statistically
>> significant change.
>>
>> Signed-off-by: Rohit Jain<rohit.k.jain@...cle.com>
>> ---
>>   kernel/sched/fair.c | 38 ++++++++++++++++++++++++++++----------
>>   1 file changed, 28 insertions(+), 10 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 26a71eb..ce5ccf8 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5625,6 +5625,11 @@ static unsigned long capacity_orig_of(int cpu)
>>          return cpu_rq(cpu)->cpu_capacity_orig;
>>   }
>>
>> +static inline bool full_capacity(int cpu)
>> +{
>> +       return capacity_of(cpu) >= (capacity_orig_of(cpu)*3)/4;
>> +}
>> +
>>   static unsigned long cpu_avg_load_per_task(int cpu)
>>   {
>>          struct rq *rq = cpu_rq(cpu);
>> @@ -6081,7 +6086,7 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
>>
>>                  for_each_cpu(cpu, cpu_smt_mask(core)) {
>>                          cpumask_clear_cpu(cpu, cpus);
>> -                       if (!idle_cpu(cpu))
>> +                       if (!idle_cpu(cpu) || !full_capacity(cpu))
>>                                  idle = false;
>>                  }
> There's some difference in logic between select_idle_core and
> select_idle_cpu as far as the full_capacity stuff you're adding goes.
> In select_idle_core, if all CPUs are !full_capacity, you're returning
> -1. But in select_idle_cpu you're returning the best idle CPU that's
> the most cap among the !full_capacity ones. Why there is this
> different in logic? Did I miss something?

This is the previous discussion on this same code. I measured the
performance difference and saw no statistically significant impact,
hence went with your suggestion of simpler code.

https://lkml.org/lkml/2017/10/3/1001

>
>> @@ -6102,7 +6107,8 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
>>    */
>>   static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
>>   {
>> -       int cpu;
>> +       int cpu, rcpu = -1;
>> +       unsigned long max_cap = 0;
>>
>>          if (!static_branch_likely(&sched_smt_present))
>>                  return -1;
>> @@ -6110,11 +6116,13 @@ static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int t
>>          for_each_cpu(cpu, cpu_smt_mask(target)) {
>>                  if (!cpumask_test_cpu(cpu, &p->cpus_allowed))
>>                          continue;
>> -               if (idle_cpu(cpu))
>> -                       return cpu;
>> +               if (idle_cpu(cpu) && (capacity_of(cpu) > max_cap)) {
>> +                       max_cap = capacity_of(cpu);
>> +                       rcpu = cpu;
> At the SMT level, do you need to bother with choosing best capacity
> among threads? If RT is eating into one of the SMT thread's underlying
> capacity, it would eat into the other's. Wondering what's the benefit
> of doing this here.

Yes, you are right because of SD_SHARE_CPUCAPACITY, however the benefit
is that if don't do this check, we might end up picking a SMT thread
which has "high" RT/IRQ activity and be on the run queue for a while,
till the pull side can bail us out.

Thanks,
Rohit

> thanks,
>
> - Joel