lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 29 Oct 2020 22:10:57 -0400
From:   Rik van Riel <riel@...riel.com>
To:     Vincent Guittot <vincent.guittot@...aro.org>,
        Chris Mason <clm@...com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Johannes Weiner <hannes@...xchg.org>,
        linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] fix scheduler regression from "sched/fair: Rework
 load_balance()"

On Mon, 2020-10-26 at 17:52 +0100, Vincent Guittot wrote:
> On Mon, 26 Oct 2020 at 17:48, Chris Mason <clm@...com> wrote:
> > On 26 Oct 2020, at 12:20, Vincent Guittot wrote:
> > 
> > > what you are suggesting is something like:
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 4978964e75e5..3b6fbf33abc2 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -9156,7 +9156,8 @@ static inline void
> > > calculate_imbalance(struct
> > > lb_env *env, struct sd_lb_stats *s
> > >          * emptying busiest.
> > >          */
> > >         if (local->group_type == group_has_spare) {
> > > -               if (busiest->group_type > group_fully_busy) {
> > > +               if ((busiest->group_type > group_fully_busy) &&
> > > +                   !(env->sd->flags & SD_SHARE_PKG_RESOURCES)) {
> > >                         /*
> > >                          * If busiest is overloaded, try to fill
> > > spare
> > >                          * capacity. This might end up creating
> > > spare
> > > capacity
> > > 
> > > which also fixes the problem for me and alignes LB with wakeup
> > > path
> > > regarding the migration
> > > in the LLC
> > 
> > Vincent’s patch on top of 5.10-rc1 looks pretty great:
> > 
> > Latency percentiles (usec) runtime 90 (s) (3320 total samples)
> >          50.0th: 161 (1687 samples)
> >          75.0th: 200 (817 samples)
> >          90.0th: 228 (488 samples)
> >          95.0th: 254 (164 samples)
> >          *99.0th: 314 (131 samples)
> >          99.5th: 330 (17 samples)
> >          99.9th: 356 (13 samples)
> >          min=29, max=358
> > 
> > Next we test in prod, which probably won’t have answers until
> > tomorrow.  Thanks again Vincent!
> 
> Great !
> 
> I'm going to run more tests on my setup as well to make sure that it
> doesn't generate unexpected side effects on other kinds of use cases.

We have tested the patch with several pretty demanding
workloads for the past several days, and it seems to
do the trick!

With all the current scheduler code from the Linus tree,
plus this patch on top, performance is as good as it ever
was before with one workload, and slightly better with
the other.

-- 
All Rights Reversed.

Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ