lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221020012502.GA20255@ranerica-svr.sc.intel.com>
Date:   Wed, 19 Oct 2022 18:25:02 -0700
From:   Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Ricardo Neri <ricardo.neri@...el.com>,
        "Ravi V. Shankar" <ravi.v.shankar@...el.com>,
        Ben Segall <bsegall@...gle.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Len Brown <len.brown@...el.com>, Mel Gorman <mgorman@...e.de>,
        "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
        Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        Valentin Schneider <vschneid@...hat.com>, x86@...nel.org,
        linux-kernel@...r.kernel.org, "Tim C . Chen" <tim.c.chen@...el.com>
Subject: Re: [PATCH 1/4] sched/fair: Simplify asym_packing logic for SMT
 sched groups

On Tue, Oct 18, 2022 at 11:34:04AM +0200, Peter Zijlstra wrote:
> On Thu, Aug 25, 2022 at 03:55:26PM -0700, Ricardo Neri wrote:
> > When the destination CPU is an SMT sibling and idle, it can only help the
> > busiest group if all of its other SMT siblings are also idle. Otherwise,
> > there is not increase in throughput.
> > 
> > It does not matter whether the busiest group has SMT siblings. Simply
> > check if there are any tasks running on the local group before proceeding.
> 
> > Reviewed-by: Len Brown <len.brown@...el.com>
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>
> > ---
> >  kernel/sched/fair.c | 29 +++++++++--------------------
> >  1 file changed, 9 insertions(+), 20 deletions(-)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 77b2048a9326..91f271ea02d2 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -8603,12 +8603,10 @@ static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
> >  				    struct sched_group *sg)
> >  {
> >  #ifdef CONFIG_SCHED_SMT
> > -	bool local_is_smt, sg_is_smt;
> > +	bool local_is_smt;
> >  	int sg_busy_cpus;
> >  
> >  	local_is_smt = sds->local->flags & SD_SHARE_CPUCAPACITY;
> > -	sg_is_smt = sg->flags & SD_SHARE_CPUCAPACITY;
> > -
> >  	sg_busy_cpus = sgs->group_weight - sgs->idle_cpus;
> >  
> >  	if (!local_is_smt) {
> > @@ -8629,25 +8627,16 @@ static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
> >  		return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> >  	}
> >  
> > -	/* @dst_cpu has SMT siblings. */
> > -
> > -	if (sg_is_smt) {
> > -		int local_busy_cpus = sds->local->group_weight -
> > -				      sds->local_stat.idle_cpus;
> > -		int busy_cpus_delta = sg_busy_cpus - local_busy_cpus;
> > -
> > -		if (busy_cpus_delta == 1)
> > -			return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> > -
> > -		return false;
> > -	}
> > -
> >  	/*
> > -	 * @sg does not have SMT siblings. Ensure that @sds::local does not end
> > -	 * up with more than one busy SMT sibling and only pull tasks if there
> > -	 * are not busy CPUs (i.e., no CPU has running tasks).
> > +	 * @dst_cpu has SMT siblings. When both @dst_cpu and the busiest core
> > +	 * have one or more busy siblings, moving tasks between them results
> > +	 * in the same throughput. Only if all the siblings of @dst_cpu are
> > +	 * idle throughput can increase.
> > +	 *
> > +	 * If the difference in the number of busy CPUs is two or more, let
> > +	 * find_busiest_group() take care of it.
> >  	 */
> > -	if (!sds->local_stat.sum_nr_running)
> > +	if (sg_busy_cpus == 1 && !sds->local_stat.sum_nr_running)
> >  		return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> >  
> 
> I can't follow this logic; doesn't this hard assume SMT2 at the very
> least? The case for Power7 with SMT8 is that SMT1 is faster than SMT2 is
> faster than SMT4, only once you have more than 4 threads active it no
> longer matters.

Now that I look at this again, the comment is correct but it is
implemented at wrong sched domain.

Consider in the "MC" level, two sched groups composed of SMT siblings.
Both sched group have one or more busy siblings and the difference
between busy siblings is exactly one. Pulling tasks to the sched group
with higher asym_prefer_cpu makes no difference in throughput. The
totality of CPU resources continue being share among the same number of
tasks. This is true for both SMT2 and SMT8, no?

Only when all the siblings of an SMT core are idle new CPU resources are
used.

The "MC" domain does not have the flag SD_SHARE_CPUCAPACITY. My patch
should check for this flag in the child domain.

At the "SMT" domain, yes, I agree that siblings with different priorities
should be accommodated.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ