linux-kernel - Re: [PATCH v4 05/12] sched/fair: Keep a fully

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230519000126.GA24449@ranerica-svr.sc.intel.com>
Date:   Thu, 18 May 2023 17:01:26 -0700
From:   Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>
To:     Shrikanth Hegde <sshegde@...ux.vnet.ibm.com>
Cc:     Ricardo Neri <ricardo.neri@...el.com>,
        "Ravi V. Shankar" <ravi.v.shankar@...el.com>,
        Ben Segall <bsegall@...gle.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Len Brown <len.brown@...el.com>, Mel Gorman <mgorman@...e.de>,
        "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
        Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        Valentin Schneider <vschneid@...hat.com>,
        Ionela Voinescu <ionela.voinescu@....com>, x86@...nel.org,
        linux-kernel@...r.kernel.org,
        "Tim C . Chen" <tim.c.chen@...el.com>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
Subject: Re: [PATCH v4 05/12] sched/fair: Keep a fully_busy SMT sched group
 as busiest

On Sat, May 13, 2023 at 12:11:45AM +0530, Shrikanth Hegde wrote:
> 
> 
> On 4/7/23 2:01 AM, Ricardo Neri wrote:
> > When comparing two fully_busy scheduling groups, keep the current busiest
> > group if it represents an SMT core. Tasks in such scheduling group share
> > CPU resources and need more help than tasks in a non-SMT fully_busy group.
> >
> > Cc: Ben Segall <bsegall@...gle.com>
> > Cc: Daniel Bristot de Oliveira <bristot@...hat.com>
> > Cc: Dietmar Eggemann <dietmar.eggemann@....com>
> > Cc: Ionela Voinescu <ionela.voinescu@....com>
> > Cc: Len Brown <len.brown@...el.com>
> > Cc: Mel Gorman <mgorman@...e.de>
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> > Cc: Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>
> > Cc: Steven Rostedt <rostedt@...dmis.org>
> > Cc: Tim C. Chen <tim.c.chen@...el.com>
> > Cc: Valentin Schneider <vschneid@...hat.com>
> > Cc: x86@...nel.org
> > Cc: linux-kernel@...r.kernel.org
> > Tested-by: Zhang Rui <rui.zhang@...el.com>
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>
> > ---
> > Changes since v3:
> >  * None
> >
> > Changes since v2:
> >  * Introduced this patch.
> >
> > Changes since v1:
> >  * N/A
> > ---
> >  kernel/sched/fair.c | 16 ++++++++++++++--
> >  1 file changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index b151e93ec316..ea23a5163bfa 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -9566,10 +9566,22 @@ static bool update_sd_pick_busiest(struct lb_env *env,
> >  		 * contention when accessing shared HW resources.
> >  		 *
> >  		 * XXX for now avg_load is not computed and always 0 so we
> > -		 * select the 1st one.
> > +		 * select the 1st one, except if @sg is composed of SMT
> > +		 * siblings.
> >  		 */
> > -		if (sgs->avg_load <= busiest->avg_load)
> > +
> > +		if (sgs->avg_load < busiest->avg_load)
> >  			return false;
> > +
> > +		if (sgs->avg_load == busiest->avg_load) {
> > +			/*
> > +			 * SMT sched groups need more help than non-SMT groups.
> > +			 * If @sg happens to also be SMT, either choice is good.
> > +			 */
> > +			if (sds->busiest->flags & SD_SHARE_CPUCAPACITY)
> > +				return false;
> > +		}
> > +
> >  		break;
> 

Thank you very much for your review!

> IIUC,
> 
> Earlier, we used to go to out_balanced if sgs->avg_load <= busiest->avg_load.
> Now we go only if it is less.

In this particular case we are comparing to fully_busy groups. Both
sgs->avg_load and busiest->avg_load are equal to zero 0.

> lets say sgs->avg_load == busiest->avg_load,
> then we will return true in MC,DIE domain. This might end up traversing
> multiple such group's and pick the last one as the busiest instead of
> first.

Yes, that is correct. But we traverse all sched groups from
update_sd_lb_stats() anyway. We are here because both sgs and busiest are
of type fully_busy and we need to break a tie. Previously we always kept
on selecting sgs as busiest.

> I guess eventually any load balance if exists will be fixed.  But
> this might cause slight overhead. would it?
> 
> 
> 
> nit: There is typo in [2/12]  if the whole core is repeated.
> +	 * CPUs. When done between cores, do it only if the whole core if the
> +	 * whole core is idle.
> 
> Mentioning in this reply instead, to avoid sending another mail reply for this.

Ah! I read my patches dozens of times and I still missed this. Thank you
for noting. I will post a trivial patch to fix it.

Thanks and BR,
Ricardo