[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3cac87ec39253019bfa04a9dfd61ce40ac85cc31.camel@surriel.com>
Date: Thu, 29 Oct 2020 22:10:57 -0400
From: Rik van Riel <riel@...riel.com>
To: Vincent Guittot <vincent.guittot@...aro.org>,
Chris Mason <clm@...com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Johannes Weiner <hannes@...xchg.org>,
linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] fix scheduler regression from "sched/fair: Rework
load_balance()"
On Mon, 2020-10-26 at 17:52 +0100, Vincent Guittot wrote:
> On Mon, 26 Oct 2020 at 17:48, Chris Mason <clm@...com> wrote:
> > On 26 Oct 2020, at 12:20, Vincent Guittot wrote:
> >
> > > what you are suggesting is something like:
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 4978964e75e5..3b6fbf33abc2 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -9156,7 +9156,8 @@ static inline void
> > > calculate_imbalance(struct
> > > lb_env *env, struct sd_lb_stats *s
> > > * emptying busiest.
> > > */
> > > if (local->group_type == group_has_spare) {
> > > - if (busiest->group_type > group_fully_busy) {
> > > + if ((busiest->group_type > group_fully_busy) &&
> > > + !(env->sd->flags & SD_SHARE_PKG_RESOURCES)) {
> > > /*
> > > * If busiest is overloaded, try to fill
> > > spare
> > > * capacity. This might end up creating
> > > spare
> > > capacity
> > >
> > > which also fixes the problem for me and alignes LB with wakeup
> > > path
> > > regarding the migration
> > > in the LLC
> >
> > Vincent’s patch on top of 5.10-rc1 looks pretty great:
> >
> > Latency percentiles (usec) runtime 90 (s) (3320 total samples)
> > 50.0th: 161 (1687 samples)
> > 75.0th: 200 (817 samples)
> > 90.0th: 228 (488 samples)
> > 95.0th: 254 (164 samples)
> > *99.0th: 314 (131 samples)
> > 99.5th: 330 (17 samples)
> > 99.9th: 356 (13 samples)
> > min=29, max=358
> >
> > Next we test in prod, which probably won’t have answers until
> > tomorrow. Thanks again Vincent!
>
> Great !
>
> I'm going to run more tests on my setup as well to make sure that it
> doesn't generate unexpected side effects on other kinds of use cases.
We have tested the patch with several pretty demanding
workloads for the past several days, and it seems to
do the trick!
With all the current scheduler code from the Linus tree,
plus this patch on top, performance is as good as it ever
was before with one workload, and slightly better with
the other.
--
All Rights Reversed.
Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)
Powered by blists - more mailing lists