linux-kernel - Re: [PATCH 2/3] sched/fair: Introduce scaled capacity awareness in select_idle

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b96c25e2-87bd-2fbc-2875-58d21f7f20e1@oracle.com>
Date:   Thu, 28 Sep 2017 10:09:14 -0500
From:   Rohit Jain <rohit.k.jain@...cle.com>
To:     joelaf <joelaf@...gle.com>
Cc:     LKML <linux-kernel@...r.kernel.org>, eas-dev@...ts.linaro.org,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Atish Patra <atish.patra@...cle.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Morten Rasmussen <morten.rasmussen@....com>
Subject: Re: [PATCH 2/3] sched/fair: Introduce scaled capacity awareness in
 select_idle_sibling code path

Hi Joel,

On 09/28/2017 05:53 AM, joelaf wrote:
> Hi Rohit,
>
> On Tue, Sep 26, 2017 at 12:48 PM, Rohit Jain <rohit.k.jain@...cle.com> wrote:
> [...]
>
<snip>
>>>>                   }
>>>>
>>>> -               if (idle)
>>>> -                       return core;
>>>> +               if (idle) {
>>>> +                       if (rcpu == -1)
>>>> +                               return (rcpu_backup != -1 ? rcpu_backup :
>>>> core);
>>>> +                       return rcpu;
>>>> +               }
>>>
>>> This didn't make much sense to me, here you are returning either an
>>> SMT thread or a core. That doesn't make much of a difference because
>>> SMT threads share the same capacity (SD_SHARE_CPUCAPACITY). I think
>>> what you want to do is find out the capacity of a 'core', not an SMT
>>> thread, and compare the capacity of different cores and consider the
>>> one which has least RT/IRQ interference.
>>
>> IIUC the capacities of each strand is scaled by IRQ and 'rt_avg' for that
>> 'rq'. Now if the strand is idle now and gets an interrupt in the future,
>> the 'core' would look like:
>>
>>     +----+----+
>>     | I  |    |
>>     | T  |    |
>>     +----+----+
>>
>> (I -> Interrupt, T-> Thread we are trying to schedule).
>>
>> whereas if the other strand on the core was taking interrupt the core
>> would look like:
>>
>>     +----+----+
>>     | I  | T  |
>>     |    |    |
>>     +----+----+
>>
>> With this case, because we know from the past avg, one of the strands is
>> running low on capacity, I am trying to return a better strand for the
>> thread to start on.
>>
> I know what you're trying to do but they way you've retrofitted it into the
> core looks weird (to me) and makes the code unreadable and ugly IMO.
>
> Why not do something simpler like skip the core if any SMT thread has been
> running at lesser capacity? I'm not sure if this works great or if the maintainers
> will prefer your or my below approach, but I find the below diff much cleaner
> for the select_idle_core bit. It also makes more sense since resources are
> shared at SMT level so makes sense to me to skip the core altogether for this:
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 6ee7242dbe0a..f324a84e29f1 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5738,14 +5738,17 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
>   
>   	for_each_cpu_wrap(core, cpus, target) {
>   		bool idle = true;
> +		bool full_cap = true;
>   
>   		for_each_cpu(cpu, cpu_smt_mask(core)) {
>   			cpumask_clear_cpu(cpu, cpus);
>   			if (!idle_cpu(cpu))
>   				idle = false;
> +			if (!full_capacity(cpu))
> +				full_cap = false;
>   		}
>   
> -		if (idle)
> +		if (idle && full_cap)
>   			return core;
>   	}
>   


Well, with your changes you will skip over fully idle cores which is not
an ideal thing either. I see that you were advocating for select
idle+lowest capacity core, whereas I was stopping at the first idlecore.

Since the whole philosophy till now in this patch is "Don't spare an
idle CPU", I think the following diff might look better to you. Please
note this is only for discussion sakes, I haven't fully tested it yet.

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ec15e5f..c2933eb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6040,7 +6040,9 @@ void __update_idle_core(struct rq *rq)
  static int select_idle_core(struct task_struct *p, struct sched_domain 
*sd, int target)
  {
      struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
-    int core, cpu;
+    int core, cpu, rcpu, backup_core;
+
+    rcpu = backup_core = -1;

      if (!static_branch_likely(&sched_smt_present))
          return -1;
@@ -6052,15 +6054,34 @@ static int select_idle_core(struct task_struct 
*p, struct sched_domain *sd, int

      for_each_cpu_wrap(core, cpus, target) {
          bool idle = true;
+        bool full_cap = true;

          for_each_cpu(cpu, cpu_smt_mask(core)) {
              cpumask_clear_cpu(cpu, cpus);
              if (!idle_cpu(cpu))
                  idle = false;
+
+            if (!full_capacity(cpu)) {
+                full_cap = false;
+            }
          }

-        if (idle)
+        if (idle && full_cap)
              return core;
+        else if (idle && backup_core == -1)
+            backup_core = core;
+    }
+
+    if (backup_core != -1) {
+        for_each_cpu(cpu, cpu_smt_mask(backup_core)) {
+            if (full_capacity(cpu))
+                return cpu;
+            else if ((rcpu == -1) ||
+                 (capacity_of(cpu) > capacity_of(rcpu)))
+                rcpu = cpu;
+        }
+
+        return rcpu;
      }


Do let me know what you think.

Thanks,
Rohit

>
> thanks,
>
> - Joel
>