lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Tue, 09 Jan 2024 16:58:27 -0800
From: Tim Chen <tim.c.chen@...ux.intel.com>
To: Shrikanth Hegde <sshegde@...ux.vnet.ibm.com>, Srikar Dronamraju
	 <srikar@...ux.vnet.ibm.com>
Cc: LKML <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...nel.org>, 
 Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, Juri
 Lelli <juri.lelli@...hat.com>,  Vincent Guittot
 <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>,
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel
 Gorman <mgorman@...e.de>, Daniel Bristot de Oliveira <bristot@...hat.com>,
 Valentin Schneider <vschneid@...hat.com>
Subject: Re: [PATCH] sched/fair: Enable group_asym_packing in
 find_idlest_group

On Thu, 2024-01-04 at 21:20 +0530, Shrikanth Hegde wrote:
> On 10/18/23 9:20 PM, Srikar Dronamraju wrote:
> 
> Hi Srikar, 
> 
> > Current scheduler code doesn't handle SD_ASYM_PACKING in the
> > find_idlest_cpu path. On few architectures, like Powerpc, cache is at a
> > core. Moving threads across cores may end up in cache misses.
> > 
> > While asym_packing can be enabled above SMT level, enabling Asym packing
> > across cores could result in poorer performance due to cache misses.
> > However if the initial task placement via find_idlest_cpu does take
> > Asym_packing into consideration, then scheduler can avoid asym_packing
> > migrations. This will result in lesser migrations and better packing and
> > better overall performance.
> > 
> 
> This would handle asym packing case when finding the idle CPU for newly woken
> up task and thereby reducing the number of migrations if it is placed correctly in 
> the first place. I think thats helpful. 
> 
> Currently intel cluster and powerVM shared LPAR's are the two where ASYM PACKING 
> is enabled at higher domain than SMT. Is that correct or is there any other topology?
> 
> +tim 
> 
> > Signed-off-by: Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
> > ---
> >  kernel/sched/fair.c | 33 ++++++++++++++++++++++++++++++---
> >  1 file changed, 30 insertions(+), 3 deletions(-)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index cb225921bbca..7164f79a3d13 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -9931,11 +9931,13 @@ static int idle_cpu_without(int cpu, struct task_struct *p)
> >   * @group: sched_group whose statistics are to be updated.
> >   * @sgs: variable to hold the statistics for this group.
> >   * @p: The task for which we look for the idlest group/CPU.
> > + * @this_cpu: current cpu
> >   */
> >  static inline void update_sg_wakeup_stats(struct sched_domain *sd,
> >  					  struct sched_group *group,
> >  					  struct sg_lb_stats *sgs,
> > -					  struct task_struct *p)
> > +					  struct task_struct *p,
> > +					  int this_cpu)
> >  {
> >  	int i, nr_running;
> >  
> > @@ -9972,6 +9974,11 @@ static inline void update_sg_wakeup_stats(struct sched_domain *sd,
> >  
> >  	}
> >  
> > +	if (sd->flags & SD_ASYM_PACKING && sgs->sum_h_nr_running &&
> > +			sched_asym_prefer(group->asym_prefer_cpu, this_cpu)) {
> > +		sgs->group_asym_packing = 1;

I disagree with the above criteria for doing asym_packing.

I think asym packing only makes sense if you have an idle CPU availabe
in the group that is preferred over this_cpu, and you have fewer
tasks than CPU.  Using group->asym_prefer_cpu
is inappropriate as that most preferred CPU may be busy.
You should be migrating task from this_cpu to that highest
priority idle_cpu identified

If the group is fully busy or overloaded, we should stick with the original
logic of picking the most lightly loaded group and not use asym_packing. 

You may want to note down the idle CPU in the group with highest priority, 
or most preferred if there are more than 1 cpu in the group to compare 
between two idle groups that have idle CPUs.

Tim

> > +	}
> > +
> 
> 
> I think there is a corner case here which could be taken care. please correct me if i 
> am wrong. 
> 
> Assume there are four sched groups, sg1, sg2, sg3 and sg4. asym packing is enabled at sd. 
> sg1, and sg3 have one task each and a new task is being created. So find_idlest_cpu is 
> called for this new task. 
> 
> Because of sgs->sum_h_nr_running check, sg1 and sg3 will have group_asym_packing, while 
> sg2 and sg4 will have group_has_spare. update_pick_idlest will choose the lowest type. 
> so group_has_spare. TIE would be between sg2 and sg4. Because of asym packing (atleast true 
> for powerpc shared LPAR case) sg4 will have lower utilization compared to sg2, and hence sg4 
> will be given as the idlest_cpu. On the next load balance sg2 will pull task from sg4 due to 
> asym packing. 
> 
> Additional migration may be avoided if we omit the sum_h_nr_running check? 
> 
> 
> >  	sgs->group_capacity = group->sgc->capacity;
> >  
> >  	sgs->group_weight = group->group_weight;
> > @@ -10012,8 +10019,17 @@ static bool update_pick_idlest(struct sched_group *idlest,
> >  			return false;
> >  		break;
> >  
> > -	case group_imbalanced:
> >  	case group_asym_packing:
> > +		if (sched_asym_prefer(group->asym_prefer_cpu, idlest->asym_prefer_cpu)) {
> > +			int busy_cpus = idlest_sgs->group_weight - idlest_sgs->idle_cpus;
> > +
> > +			busy_cpus -= (sgs->group_weight - sgs->idle_cpus);
> > +			if (busy_cpus >= 0)
> > +				return true;
> 
> 
> wouldn't using idle_cpus would be simpler? something like, 
> 
> if (sgs->idle_cpus - idlest->idle_cpus > 0)
> 	return true
> 
> > +		}
> > +		return false;
> > +
> > +	case group_imbalanced:
> >  	case group_smt_balance:
> >  		/* Those types are not used in the slow wakeup path */
> >  		return false;
> > @@ -10080,7 +10096,7 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu)
> >  			sgs = &tmp_sgs;
> >  		}
> >  
> > -		update_sg_wakeup_stats(sd, group, sgs, p);
> > +		update_sg_wakeup_stats(sd, group, sgs, p, this_cpu);
> >  
> >  		if (!local_group && update_pick_idlest(idlest, &idlest_sgs, group, sgs)) {
> >  			idlest = group;
> > @@ -10112,6 +10128,17 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu)
> >  	if (local_sgs.group_type > idlest_sgs.group_type)
> >  		return idlest;
> >  
> > +	if (idlest_sgs.group_type == group_asym_packing) {
> > +		if (sched_asym_prefer(idlest->asym_prefer_cpu, local->asym_prefer_cpu)) {
> > +			int busy_cpus = local_sgs.group_weight - local_sgs.idle_cpus;
> > +
> > +			busy_cpus -= (idlest_sgs.group_weight - idlest_sgs.idle_cpus);
> > +			if (busy_cpus >= 0)
> > +				return idlest;
> > +		}
> > +		return NULL;
> > +	}
> 
> same comment of using idle_cpus 
> 
> > +
> >  	switch (local_sgs.group_type) {
> >  	case group_overloaded:
> >  	case group_fully_busy:
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ