[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <eba3aa23-99ca-ce72-4933-940b8dd6ff6b@linux.vnet.ibm.com>
Date: Mon, 17 Jul 2023 17:48:02 +0530
From: Shrikanth Hegde <sshegde@...ux.vnet.ibm.com>
To: Peter Zijlstra <peterz@...radead.org>,
Tim Chen <tim.c.chen@...ux.intel.com>
Cc: Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Ricardo Neri <ricardo.neri@...el.com>,
"Ravi V . Shankar" <ravi.v.shankar@...el.com>,
Ben Segall <bsegall@...gle.com>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Len Brown <len.brown@...el.com>, Mel Gorman <mgorman@...e.de>,
"Rafael J . Wysocki" <rafael.j.wysocki@...el.com>,
Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
Steven Rostedt <rostedt@...dmis.org>,
Valentin Schneider <vschneid@...hat.com>,
Ionela Voinescu <ionela.voinescu@....com>, x86@...nel.org,
linux-kernel@...r.kernel.org,
Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
naveen.n.rao@...ux.vnet.ibm.com,
Yicong Yang <yangyicong@...ilicon.com>,
Barry Song <v-songbaohua@...o.com>,
Chen Yu <yu.c.chen@...el.com>, Hillf Danton <hdanton@...a.com>
Subject: Re: [Patch v3 1/6] sched/fair: Determine active load balance for SMT
sched groups
On 7/17/23 4:40 PM, Peter Zijlstra wrote:
> On Mon, Jul 17, 2023 at 01:06:59AM +0530, Shrikanth Hegde wrote:
>>
>>
>> On 7/15/23 4:35 AM, Tim Chen wrote:
>>> On Fri, 2023-07-14 at 18:36 +0530, Shrikanth Hegde wrote:
>>>
>>>>
>>>>
>>>> If we consider symmetric platforms which have SMT4 such as power10.
>>>> we have a topology like below. multiple such MC will form DIE(PKG)
>>>>
>>>>
>>>> [0 2 4 6][1 3 5 7][8 10 12 14][9 11 13 15]
>>>> [--SMT--][--SMT--][----SMT---][---SMT----]
>>>> [--sg1--][--sg1--][---sg1----][---sg1----]
>>>> [--------------MC------------------------]
>>>>
>>>> In case of SMT4, if there is any group which has 2 or more tasks, that
>>>> group will be marked as group_smt_balance. previously, if that group had 2
>>>> or 3 tasks, it would have been marked as group_has_spare. Since all the groups have
>>>> SMT that means behavior would be same fully busy right? That can cause some
>>>> corner cases. No?
>>>
>>> You raised a good point. I was looking from SMT2
>>> perspective so group_smt_balance implies group_fully_busy.
>>> That is no longer true for SMT4.
>>>
>>> I am thinking of the following fix on the current patch
>>> to take care of SMT4. Do you think this addresses
>>
>> Thanks Tim for taking a look at it again.
>>
>> Yes. I think this would address some of the corner cases.
>> Any SMT4 group having 2,3,4 will have smt_balance as the group type, and busiest one
>> is the one which has least number of idle cpu's. (same conditions as group_has_spare)
>>
>>
>>
>>
>>> concerns from you and Tobias?
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 294a662c9410..3fc8d3a3bd22 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -9588,6 +9588,17 @@ static bool update_sd_pick_busiest(struct lb_env *env,
>>> break;
>>>
>>> case group_smt_balance:
>>> + /* no idle cpus on both groups handled by group_fully_busy below */
>>> + if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0) {
>>> + if (sgs->idle_cpus > busiest->idle_cpus)
>>> + return false;
>>> + if (sgs->idle_cpus < busiest->idle_cpus)
>>> + return true;
>>> + if (sgs->sum_nr_running <= busiest_sum_nr_running)
>>> + return false;
>>> + else
>>> + return true;
>>> + }
>>>
>>>
>>> I will be on vacation next three weeks so my response will be slow.
>>>
>>> Tim
>>>
>>>>
>>
>> Small suggestion to above code to avoid compiler warning of switch case falling
>> through and else case can be removed, since update_sd_pick_busiest by default returns true.
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index e5a75c76bcaa..ae364ac6f22e 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -9728,9 +9728,9 @@ static bool update_sd_pick_busiest(struct lb_env *env,
>> return true;
>> if (sgs->sum_nr_running <= busiest->sum_nr_running)
>> return false;
>> - else
>> - return true;
>> }
>> + break;
>> +
>> case group_fully_busy:
>> /*
>> * Select the fully busy group with highest avg_load. In
>>
>>
>
> Can someone please send a full patch for this? I've already queued Tim's
> patches in tip/sched/core (tip-bot seems to have died somewhere last
> week, it's being worked on).
Hi Peter.
Sending on behalf of tim. I have included my suggestion as well. Hope that's ok.
Please find below the patch as of now. it includes the couple of changes that are discussed. (in 1/6 and in 3/6)
---
kernel/sched/fair.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 932e7b78894a..9502013abe33 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9532,7 +9532,7 @@ static inline long sibling_imbalance(struct lb_env *env,
imbalance /= ncores_local + ncores_busiest;
/* Take advantage of resource in an empty sched group */
- if (imbalance == 0 && local->sum_nr_running == 0 &&
+ if (imbalance <= 1 && local->sum_nr_running == 0 &&
busiest->sum_nr_running > 1)
imbalance = 2;
@@ -9720,6 +9720,17 @@ static bool update_sd_pick_busiest(struct lb_env *env,
break;
case group_smt_balance:
+ /* no idle cpus on both groups handled by group_fully_busy below */
+ if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0) {
+ if (sgs->idle_cpus > busiest->idle_cpus)
+ return false;
+ if (sgs->idle_cpus < busiest->idle_cpus)
+ return true;
+ if (sgs->sum_nr_running <= busiest->sum_nr_running)
+ return false;
+ }
+ break;
+
case group_fully_busy:
/*
* Select the fully busy group with highest avg_load. In
--
2.31.1
Powered by blists - more mailing lists