lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 4 Jan 2022 14:48:02 +0100
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Dietmar Eggemann <dietmar.eggemann@....com>
Cc:     mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        bristot@...hat.com, linux-kernel@...r.kernel.org,
        rickyiu@...gle.com, odin@...d.al, sachinp@...ux.vnet.ibm.com,
        naresh.kamboju@...aro.org
Subject: Re: [PATCH v2 1/3] sched/pelt: Don't sync hardly util_sum with uti_avg

On Tue, 4 Jan 2022 at 12:47, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
>
> On 22/12/2021 10:38, Vincent Guittot wrote:
>
> s/util_sum with uti_avg/util_sum with util_avg
>
> [...]
>
> > +#define MIN_DIVIDER (LOAD_AVG_MAX - 1024)
>
> Shouldn't this be in pelt.h?

It is only used in fair.c so I kept it local like some other defines in fair.c


>
> [...]
>
> > @@ -3466,13 +3466,30 @@ update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq
> >        */
> >       divider = get_pelt_divider(&cfs_rq->avg);
> >
> > +
> >       /* Set new sched_entity's utilization */
> >       se->avg.util_avg = gcfs_rq->avg.util_avg;
> > -     se->avg.util_sum = se->avg.util_avg * divider;
> > +     new_sum = se->avg.util_avg * divider;
> > +     delta_sum = (long)new_sum - (long)se->avg.util_sum;
> > +     se->avg.util_sum = new_sum;
> >
> >       /* Update parent cfs_rq utilization */
> > -     add_positive(&cfs_rq->avg.util_avg, delta);
> > -     cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * divider;
> > +     add_positive(&cfs_rq->avg.util_avg, delta_avg);
> > +     add_positive(&cfs_rq->avg.util_sum, delta_sum);
> > +
> > +     /*
> > +      * Because of rounding, se->util_sum might ends up being +1 more than
> > +      * cfs->util_sum (XXX fix the rounding). Although this is not
> > +      * a problem by itself, detaching a lot of tasks with the rounding
> > +      * problem between 2 updates of util_avg (~1ms) can make cfs->util_sum
> > +      * becoming null whereas cfs_util_avg is not.
> > +      * Check that util_sum is still above its lower bound for the new
> > +      * util_avg. Given that period_contrib might have moved since the last
> > +      * sync, we are only sure that util_sum must be above or equal to
> > +      *    util_avg * minimum possible divider
> > +      */
> > +     cfs_rq->avg.util_sum = max_t(u32, cfs_rq->avg.util_sum,
> > +                                       cfs_rq->avg.util_avg * MIN_DIVIDER);
> >  }
> >
>
> I still wonder whether the regression only comes from the changes in
> update_cfs_rq_load_avg(), introduced by 1c35b07e6d39.
> And could be fixed only by this part of the patch-set (A):
>
> @@ -3677,15 +3706,22 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq
> *cfs_rq)
>
>     r = removed_load;
>     sub_positive(&sa->load_avg, r);
> -   sa->load_sum = sa->load_avg * divider;
> +   sub_positive(&sa->load_sum, r * divider);
> +   sa->load_sum = max_t(u32, sa->load_sum, sa->load_avg * MIN_DIVIDER);
>
>     r = removed_util;
>     sub_positive(&sa->util_avg, r);
> -   sa->util_sum = sa->util_avg * divider;
> +   sub_positive(&sa->util_sum, r * divider);
> +   sa->util_sum = max_t(u32, sa->util_sum, sa->util_avg * MIN_DIVIDER);
>
>     r = removed_runnable;
>     sub_positive(&sa->runnable_avg, r);
> -   sa->runnable_sum = sa->runnable_avg * divider;
> +   sub_positive(&sa->runnable_sum, r * divider);
> +   sa->runnable_sum = max_t(u32, sa->runnable_sum,
> +                                 sa->runnable_avg * MIN_DIVIDER);
>
> i.e. w/o changing update_tg_cfs_X() (and
> detach_entity_load_avg()/dequeue_load_avg()).
>
> update_load_avg()
>   update_cfs_rq_load_avg()    <---
>   propagate_entity_load_avg()
>     update_tg_cfs_X()         <---
>
>
> I didn't see the SCHED_WARN_ON() [cfs_rq_is_decayed()] when looping on
> hackbench in several different sched group levels on
> [Hikey620 (Arm64, 8 CPUs, SMP, 4 taskgroups: A/B C/D E/F G/H), >12h uptime].
>
> Rick is probably in a position to test whether this would be sufficient
> to cure the CPU frequency regression.
>
> I can see that you want to use the same _avg/_sum sync in
> detach_entity_load_avg()/dequeue_load_avg() as in
> update_cfs_rq_load_avg(). (B)
>
> And finally in update_tg_cfs_X() as well plus down-propagating _sum
> independently from _avg. (C)
>
> So rather splitting the patchset into X (util, runnable, load) the whole
> change might be easier to handle IMHO when split into (A), (B), (C)
> (obviously only in case (A) cures the regression).
>
> >  static inline void
> > @@ -3681,7 +3698,9 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
> >
> >               r = removed_util;
> >               sub_positive(&sa->util_avg, r);
> > -             sa->util_sum = sa->util_avg * divider;
> > +             sub_positive(&sa->util_sum, r * divider);
> > +             /* See update_tg_cfs_util() */
> > +             sa->util_sum = max_t(u32, sa->util_sum, sa->util_avg * MIN_DIVIDER);
> >
> >               r = removed_runnable;
> >               sub_positive(&sa->runnable_avg, r);
> > @@ -3780,7 +3799,11 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
> >
> >       dequeue_load_avg(cfs_rq, se);
> >       sub_positive(&cfs_rq->avg.util_avg, se->avg.util_avg);
> > -     cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * divider;
> > +     sub_positive(&cfs_rq->avg.util_sum, se->avg.util_sum);
> > +     /* See update_tg_cfs_util() */
> > +     cfs_rq->avg.util_sum = max_t(u32, cfs_rq->avg.util_sum,
> > +                                       cfs_rq->avg.util_avg * MIN_DIVIDER);
> > +
>
> Maybe add a:
>
> Fixes: fcf6631f3736 ("sched/pelt: Ensure that *_sum is always synced
> with *_avg")
>
> [...]
>
> This max_t() should make sure that `_sum is always >= _avg *
> MIN_DIVIDER`. Which is not the case sometimes. Currently this is done in
>
> (1) update_cfs_rq_load_avg()
> (2) detach_entity_load_avg() and dequeue_load_avg()
> (3) update_tg_cfs_X()
>
> but not in attach_entity_load_avg(), enqueue_load_avg(). What's the
> reason for this?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ