linux-kernel - Re: [PATCH v2 1/3] sched/pelt: Don't sync hardly util_sum with uti

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtCVD40GiDEG0pnU+k-nwMAh2PSu_OXq4w3k0A0zR4cLpw@mail.gmail.com>
Date:   Wed, 5 Jan 2022 14:57:45 +0100
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Dietmar Eggemann <dietmar.eggemann@....com>
Cc:     mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        bristot@...hat.com, linux-kernel@...r.kernel.org,
        rickyiu@...gle.com, odin@...d.al, sachinp@...ux.vnet.ibm.com,
        naresh.kamboju@...aro.org
Subject: Re: [PATCH v2 1/3] sched/pelt: Don't sync hardly util_sum with uti_avg

On Wed, 5 Jan 2022 at 14:15, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
>
> On 04/01/2022 14:42, Vincent Guittot wrote:
> > On Tue, 4 Jan 2022 at 12:47, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
> >>
> >> On 22/12/2021 10:38, Vincent Guittot wrote:
>
> [...]
>
> >> I still wonder whether the regression only comes from the changes in
> >> update_cfs_rq_load_avg(), introduced by 1c35b07e6d39.
> >> And could be fixed only by this part of the patch-set (A):
> >
> > I have been able to trigger the warning even with (A) though It took
> > much more time.
> > And I have been able to catch wrong situations  (with additional
> > traces) in the 3 places A, B and C
>
> OK. By wrong situation you mean '_sum < _avg * MIN_DIVIDER' ?

not only.
also util_sum == 0 but util_avg !=0 in different places although these
situation didn't trigger sched_warn because some other sync happened
before the periodic call of __update_blocked_fair
or if nr_running == 1 and  and task's util_avg/sum > cfs' util_avg/sum
just before removing the task

>
> >> @@ -3677,15 +3706,22 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq
> >> *cfs_rq)
> >>
> >>     r = removed_load;
> >>     sub_positive(&sa->load_avg, r);
> >> -   sa->load_sum = sa->load_avg * divider;
> >> +   sub_positive(&sa->load_sum, r * divider);
> >> +   sa->load_sum = max_t(u32, sa->load_sum, sa->load_avg * MIN_DIVIDER);
> >>
> >>     r = removed_util;
> >>     sub_positive(&sa->util_avg, r);
> >> -   sa->util_sum = sa->util_avg * divider;
> >> +   sub_positive(&sa->util_sum, r * divider);
> >> +   sa->util_sum = max_t(u32, sa->util_sum, sa->util_avg * MIN_DIVIDER);
> >>
> >>     r = removed_runnable;
> >>     sub_positive(&sa->runnable_avg, r);
> >> -   sa->runnable_sum = sa->runnable_avg * divider;
> >> +   sub_positive(&sa->runnable_sum, r * divider);
> >> +   sa->runnable_sum = max_t(u32, sa->runnable_sum,
> >> +                                 sa->runnable_avg * MIN_DIVIDER);
> >>
> >> i.e. w/o changing update_tg_cfs_X() (and
> >> detach_entity_load_avg()/dequeue_load_avg()).
> >>
> >> update_load_avg()
> >>   update_cfs_rq_load_avg()    <---
> >>   propagate_entity_load_avg()
> >>     update_tg_cfs_X()         <---
> >>
> >>
> >> I didn't see the SCHED_WARN_ON() [cfs_rq_is_decayed()] when looping on
> >> hackbench in several different sched group levels on
> >> [Hikey620 (Arm64, 8 CPUs, SMP, 4 taskgroups: A/B C/D E/F G/H), >12h uptime].
> >
> > IIRC, it was with hikey960 with cgroup v1
> > As a side note, I never trigger the problem with dragonboard845 and cgroup v2
>
> OK, just started a test on hikey960 cgroupv1. Let's see if I can catch it.
>
> [...]
>
> >>> @@ -3780,7 +3799,11 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
> >>>
> >>>       dequeue_load_avg(cfs_rq, se);
> >>>       sub_positive(&cfs_rq->avg.util_avg, se->avg.util_avg);
> >>> -     cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * divider;
> >>> +     sub_positive(&cfs_rq->avg.util_sum, se->avg.util_sum);
> >>> +     /* See update_tg_cfs_util() */
> >>> +     cfs_rq->avg.util_sum = max_t(u32, cfs_rq->avg.util_sum,
> >>> +                                       cfs_rq->avg.util_avg * MIN_DIVIDER);
> >>> +
> >>
> >> Maybe add a:
> >>
> >> Fixes: fcf6631f3736 ("sched/pelt: Ensure that *_sum is always synced
> >> with *_avg")
> >
> > I spent time thinking about adding fixes tag. There is no crash/warn
> > so far so should we propagate it back in LTS for better performance ?
>
> Not sure I understand. What do you mean by 'should we propagate it back
> in LTS'?

Sorry I had any stables in mind and not only LTS.

Some of the changes done in PELT signal propagation that replace
subtracting util_sum  by using util_avg * divider instead, are related
to other problems with sched group hierarchy and
throttling/unthrottling. I'm not 100% confident that using fixes tag
to backport this on stables doesn't need to backport more patches on
other areas in order to not resurrect old problems. So I wonder if
it's worth  backporting this on stables

>
> [...]
>
> >> This max_t() should make sure that `_sum is always >= _avg *
> >> MIN_DIVIDER`. Which is not the case sometimes. Currently this is done in
> >>
> >> (1) update_cfs_rq_load_avg()
> >> (2) detach_entity_load_avg() and dequeue_load_avg()
> >> (3) update_tg_cfs_X()
> >>
> >> but not in attach_entity_load_avg(), enqueue_load_avg(). What's the
> >> reason for this?
> >
> > Main reason is that I have never seen the problem.
> > Then, the problem comes from subtracting task's value whereas here we
> > always add positive value
>
> OK, I see. The add_positive()'s in update_tg_cfs_X() deal w/ `long` values.