[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b96c25e2-87bd-2fbc-2875-58d21f7f20e1@oracle.com>
Date: Thu, 28 Sep 2017 10:09:14 -0500
From: Rohit Jain <rohit.k.jain@...cle.com>
To: joelaf <joelaf@...gle.com>
Cc: LKML <linux-kernel@...r.kernel.org>, eas-dev@...ts.linaro.org,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Atish Patra <atish.patra@...cle.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Morten Rasmussen <morten.rasmussen@....com>
Subject: Re: [PATCH 2/3] sched/fair: Introduce scaled capacity awareness in
select_idle_sibling code path
Hi Joel,
On 09/28/2017 05:53 AM, joelaf wrote:
> Hi Rohit,
>
> On Tue, Sep 26, 2017 at 12:48 PM, Rohit Jain <rohit.k.jain@...cle.com> wrote:
> [...]
>
<snip>
>>>> }
>>>>
>>>> - if (idle)
>>>> - return core;
>>>> + if (idle) {
>>>> + if (rcpu == -1)
>>>> + return (rcpu_backup != -1 ? rcpu_backup :
>>>> core);
>>>> + return rcpu;
>>>> + }
>>>
>>> This didn't make much sense to me, here you are returning either an
>>> SMT thread or a core. That doesn't make much of a difference because
>>> SMT threads share the same capacity (SD_SHARE_CPUCAPACITY). I think
>>> what you want to do is find out the capacity of a 'core', not an SMT
>>> thread, and compare the capacity of different cores and consider the
>>> one which has least RT/IRQ interference.
>>
>> IIUC the capacities of each strand is scaled by IRQ and 'rt_avg' for that
>> 'rq'. Now if the strand is idle now and gets an interrupt in the future,
>> the 'core' would look like:
>>
>> +----+----+
>> | I | |
>> | T | |
>> +----+----+
>>
>> (I -> Interrupt, T-> Thread we are trying to schedule).
>>
>> whereas if the other strand on the core was taking interrupt the core
>> would look like:
>>
>> +----+----+
>> | I | T |
>> | | |
>> +----+----+
>>
>> With this case, because we know from the past avg, one of the strands is
>> running low on capacity, I am trying to return a better strand for the
>> thread to start on.
>>
> I know what you're trying to do but they way you've retrofitted it into the
> core looks weird (to me) and makes the code unreadable and ugly IMO.
>
> Why not do something simpler like skip the core if any SMT thread has been
> running at lesser capacity? I'm not sure if this works great or if the maintainers
> will prefer your or my below approach, but I find the below diff much cleaner
> for the select_idle_core bit. It also makes more sense since resources are
> shared at SMT level so makes sense to me to skip the core altogether for this:
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 6ee7242dbe0a..f324a84e29f1 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5738,14 +5738,17 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
>
> for_each_cpu_wrap(core, cpus, target) {
> bool idle = true;
> + bool full_cap = true;
>
> for_each_cpu(cpu, cpu_smt_mask(core)) {
> cpumask_clear_cpu(cpu, cpus);
> if (!idle_cpu(cpu))
> idle = false;
> + if (!full_capacity(cpu))
> + full_cap = false;
> }
>
> - if (idle)
> + if (idle && full_cap)
> return core;
> }
>
Well, with your changes you will skip over fully idle cores which is not
an ideal thing either. I see that you were advocating for select
idle+lowest capacity core, whereas I was stopping at the first idlecore.
Since the whole philosophy till now in this patch is "Don't spare an
idle CPU", I think the following diff might look better to you. Please
note this is only for discussion sakes, I haven't fully tested it yet.
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ec15e5f..c2933eb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6040,7 +6040,9 @@ void __update_idle_core(struct rq *rq)
static int select_idle_core(struct task_struct *p, struct sched_domain
*sd, int target)
{
struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
- int core, cpu;
+ int core, cpu, rcpu, backup_core;
+
+ rcpu = backup_core = -1;
if (!static_branch_likely(&sched_smt_present))
return -1;
@@ -6052,15 +6054,34 @@ static int select_idle_core(struct task_struct
*p, struct sched_domain *sd, int
for_each_cpu_wrap(core, cpus, target) {
bool idle = true;
+ bool full_cap = true;
for_each_cpu(cpu, cpu_smt_mask(core)) {
cpumask_clear_cpu(cpu, cpus);
if (!idle_cpu(cpu))
idle = false;
+
+ if (!full_capacity(cpu)) {
+ full_cap = false;
+ }
}
- if (idle)
+ if (idle && full_cap)
return core;
+ else if (idle && backup_core == -1)
+ backup_core = core;
+ }
+
+ if (backup_core != -1) {
+ for_each_cpu(cpu, cpu_smt_mask(backup_core)) {
+ if (full_capacity(cpu))
+ return cpu;
+ else if ((rcpu == -1) ||
+ (capacity_of(cpu) > capacity_of(rcpu)))
+ rcpu = cpu;
+ }
+
+ return rcpu;
}
Do let me know what you think.
Thanks,
Rohit
>
> thanks,
>
> - Joel
>
Powered by blists - more mailing lists