[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230519000126.GA24449@ranerica-svr.sc.intel.com>
Date: Thu, 18 May 2023 17:01:26 -0700
From: Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>
To: Shrikanth Hegde <sshegde@...ux.vnet.ibm.com>
Cc: Ricardo Neri <ricardo.neri@...el.com>,
"Ravi V. Shankar" <ravi.v.shankar@...el.com>,
Ben Segall <bsegall@...gle.com>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Len Brown <len.brown@...el.com>, Mel Gorman <mgorman@...e.de>,
"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
Steven Rostedt <rostedt@...dmis.org>,
Tim Chen <tim.c.chen@...ux.intel.com>,
Valentin Schneider <vschneid@...hat.com>,
Ionela Voinescu <ionela.voinescu@....com>, x86@...nel.org,
linux-kernel@...r.kernel.org,
"Tim C . Chen" <tim.c.chen@...el.com>,
"Peter Zijlstra (Intel)" <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
Subject: Re: [PATCH v4 05/12] sched/fair: Keep a fully_busy SMT sched group
as busiest
On Sat, May 13, 2023 at 12:11:45AM +0530, Shrikanth Hegde wrote:
>
>
> On 4/7/23 2:01 AM, Ricardo Neri wrote:
> > When comparing two fully_busy scheduling groups, keep the current busiest
> > group if it represents an SMT core. Tasks in such scheduling group share
> > CPU resources and need more help than tasks in a non-SMT fully_busy group.
> >
> > Cc: Ben Segall <bsegall@...gle.com>
> > Cc: Daniel Bristot de Oliveira <bristot@...hat.com>
> > Cc: Dietmar Eggemann <dietmar.eggemann@....com>
> > Cc: Ionela Voinescu <ionela.voinescu@....com>
> > Cc: Len Brown <len.brown@...el.com>
> > Cc: Mel Gorman <mgorman@...e.de>
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> > Cc: Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>
> > Cc: Steven Rostedt <rostedt@...dmis.org>
> > Cc: Tim C. Chen <tim.c.chen@...el.com>
> > Cc: Valentin Schneider <vschneid@...hat.com>
> > Cc: x86@...nel.org
> > Cc: linux-kernel@...r.kernel.org
> > Tested-by: Zhang Rui <rui.zhang@...el.com>
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>
> > ---
> > Changes since v3:
> > * None
> >
> > Changes since v2:
> > * Introduced this patch.
> >
> > Changes since v1:
> > * N/A
> > ---
> > kernel/sched/fair.c | 16 ++++++++++++++--
> > 1 file changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index b151e93ec316..ea23a5163bfa 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -9566,10 +9566,22 @@ static bool update_sd_pick_busiest(struct lb_env *env,
> > * contention when accessing shared HW resources.
> > *
> > * XXX for now avg_load is not computed and always 0 so we
> > - * select the 1st one.
> > + * select the 1st one, except if @sg is composed of SMT
> > + * siblings.
> > */
> > - if (sgs->avg_load <= busiest->avg_load)
> > +
> > + if (sgs->avg_load < busiest->avg_load)
> > return false;
> > +
> > + if (sgs->avg_load == busiest->avg_load) {
> > + /*
> > + * SMT sched groups need more help than non-SMT groups.
> > + * If @sg happens to also be SMT, either choice is good.
> > + */
> > + if (sds->busiest->flags & SD_SHARE_CPUCAPACITY)
> > + return false;
> > + }
> > +
> > break;
>
Thank you very much for your review!
> IIUC,
>
> Earlier, we used to go to out_balanced if sgs->avg_load <= busiest->avg_load.
> Now we go only if it is less.
In this particular case we are comparing to fully_busy groups. Both
sgs->avg_load and busiest->avg_load are equal to zero 0.
> lets say sgs->avg_load == busiest->avg_load,
> then we will return true in MC,DIE domain. This might end up traversing
> multiple such group's and pick the last one as the busiest instead of
> first.
Yes, that is correct. But we traverse all sched groups from
update_sd_lb_stats() anyway. We are here because both sgs and busiest are
of type fully_busy and we need to break a tie. Previously we always kept
on selecting sgs as busiest.
> I guess eventually any load balance if exists will be fixed. But
> this might cause slight overhead. would it?
>
>
>
> nit: There is typo in [2/12] if the whole core is repeated.
> + * CPUs. When done between cores, do it only if the whole core if the
> + * whole core is idle.
>
> Mentioning in this reply instead, to avoid sending another mail reply for this.
Ah! I read my patches dozens of times and I still missed this. Thank you
for noting. I will post a trivial patch to fix it.
Thanks and BR,
Ricardo
Powered by blists - more mailing lists