lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 17 Jul 2023 17:48:02 +0530
From:   Shrikanth Hegde <sshegde@...ux.vnet.ibm.com>
To:     Peter Zijlstra <peterz@...radead.org>,
        Tim Chen <tim.c.chen@...ux.intel.com>
Cc:     Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Ricardo Neri <ricardo.neri@...el.com>,
        "Ravi V . Shankar" <ravi.v.shankar@...el.com>,
        Ben Segall <bsegall@...gle.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Len Brown <len.brown@...el.com>, Mel Gorman <mgorman@...e.de>,
        "Rafael J . Wysocki" <rafael.j.wysocki@...el.com>,
        Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Valentin Schneider <vschneid@...hat.com>,
        Ionela Voinescu <ionela.voinescu@....com>, x86@...nel.org,
        linux-kernel@...r.kernel.org,
        Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
        naveen.n.rao@...ux.vnet.ibm.com,
        Yicong Yang <yangyicong@...ilicon.com>,
        Barry Song <v-songbaohua@...o.com>,
        Chen Yu <yu.c.chen@...el.com>, Hillf Danton <hdanton@...a.com>
Subject: Re: [Patch v3 1/6] sched/fair: Determine active load balance for SMT
 sched groups



On 7/17/23 4:40 PM, Peter Zijlstra wrote:
> On Mon, Jul 17, 2023 at 01:06:59AM +0530, Shrikanth Hegde wrote:
>>
>>
>> On 7/15/23 4:35 AM, Tim Chen wrote:
>>> On Fri, 2023-07-14 at 18:36 +0530, Shrikanth Hegde wrote:
>>>
>>>>
>>>>
>>>> If we consider symmetric platforms which have SMT4 such as power10. 
>>>> we have a topology like below. multiple such MC will form DIE(PKG)
>>>>
>>>>
>>>> [0 2 4 6][1 3 5 7][8 10 12 14][9 11 13 15]
>>>> [--SMT--][--SMT--][----SMT---][---SMT----]
>>>> [--sg1--][--sg1--][---sg1----][---sg1----]
>>>> [--------------MC------------------------]
>>>>
>>>> In case of SMT4, if there is any group which has 2 or more tasks, that 
>>>> group will be marked as group_smt_balance. previously, if that group had 2
>>>> or 3 tasks, it would have been marked as group_has_spare. Since all the groups have 
>>>> SMT that means behavior would be same fully busy right? That can cause some 
>>>> corner cases. No?
>>>
>>> You raised a good point. I was looking from SMT2
>>> perspective so group_smt_balance implies group_fully_busy.
>>> That is no longer true for SMT4.
>>>
>>> I am thinking of the following fix on the current patch
>>> to take care of SMT4. Do you think this addresses
>>
>> Thanks Tim for taking a look at it again. 
>>
>> Yes. I think this would address some of the corner cases. 
>> Any SMT4 group having 2,3,4 will have smt_balance as the group type, and busiest one 
>> is the one which has least number of idle cpu's. (same conditions as group_has_spare)
>>
>>
>>
>>
>>> concerns from you and Tobias?
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 294a662c9410..3fc8d3a3bd22 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -9588,6 +9588,17 @@ static bool update_sd_pick_busiest(struct lb_env *env,
>>>                 break;
>>>  
>>>         case group_smt_balance:
>>> +               /* no idle cpus on both groups handled by group_fully_busy below */
>>> +               if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0) {
>>> +                       if (sgs->idle_cpus > busiest->idle_cpus)
>>> +                               return false;
>>> +                       if (sgs->idle_cpus < busiest->idle_cpus)
>>> +                               return true;
>>> +                       if (sgs->sum_nr_running <= busiest_sum_nr_running)
>>> +                               return false;
>>> +                       else
>>> +                               return true;
>>> +               }
>>>
>>>
>>> I will be on vacation next three weeks so my response will be slow.
>>>
>>> Tim
>>>
>>>>
>>
>> Small suggestion to above code to avoid compiler warning of switch case falling
>> through and else case can be removed, since update_sd_pick_busiest by default returns true.
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index e5a75c76bcaa..ae364ac6f22e 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -9728,9 +9728,9 @@ static bool update_sd_pick_busiest(struct lb_env *env,
>>                                 return true;
>>                         if (sgs->sum_nr_running <= busiest->sum_nr_running)
>>                                 return false;
>> -                       else
>> -                               return true;
>>                 }
>> +               break;
>> +
>>         case group_fully_busy:
>>                 /*
>>                  * Select the fully busy group with highest avg_load. In
>>
>>
> 
> Can someone please send a full patch for this? I've already queued Tim's
> patches in tip/sched/core (tip-bot seems to have died somewhere last
> week, it's being worked on).

Hi Peter. 

Sending on behalf of tim. I have included my suggestion as well. Hope that's ok.
Please find below the patch as of now. it includes the couple of changes that are discussed. (in 1/6 and in 3/6)


---
 kernel/sched/fair.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 932e7b78894a..9502013abe33 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9532,7 +9532,7 @@ static inline long sibling_imbalance(struct lb_env *env,
 	imbalance /= ncores_local + ncores_busiest;
 
 	/* Take advantage of resource in an empty sched group */
-	if (imbalance == 0 && local->sum_nr_running == 0 &&
+	if (imbalance <= 1 && local->sum_nr_running == 0 &&
 	    busiest->sum_nr_running > 1)
 		imbalance = 2;
 
@@ -9720,6 +9720,17 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 		break;
 
 	case group_smt_balance:
+		/* no idle cpus on both groups handled by group_fully_busy below */
+		if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0) {
+			if (sgs->idle_cpus > busiest->idle_cpus)
+				return false;
+			if (sgs->idle_cpus < busiest->idle_cpus)
+				return true;
+			if (sgs->sum_nr_running <= busiest->sum_nr_running)
+				return false;
+		}
+		break;
+
 	case group_fully_busy:
 		/*
 		 * Select the fully busy group with highest avg_load. In
-- 
2.31.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ