linux-kernel - Re: [PATCH] fair/util_est: Separate util_est_dequeue() for cfs_rq_util

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtBRid+d1bkd4eEXmB_zuF3OgWODD-HipuvpJk9jg6UqMg@mail.gmail.com>
Date:   Tue, 15 Dec 2020 10:39:11 +0100
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Dietmar Eggemann <dietmar.eggemann@....com>
Cc:     Ryan Y <xuewen.yan94@...il.com>,
        Patrick Bellasi <patrick.bellasi@....com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Benjamin Segall <bsegall@...gle.com>,
        Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Xuewen Yan <Xuewen.Yan@...soc.com>,
        Ryan Y <xuewyan@...mail.com>,
        Chunyan Zhang <zhang.lyra@...il.com>,
        王科 (Ke Wang) <Ke.Wang@...soc.com>
Subject: Re: [PATCH] fair/util_est: Separate util_est_dequeue() for cfs_rq_util_change

On Mon, 14 Dec 2020 at 19:46, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
>
> On 11/12/2020 13:03, Ryan Y wrote:
> > Hi Dietmar,
> >
> > Yes! That's exactly what I meant.
> >
> >> The issue is that sugov_update_[shared\|single] -> sugov_get_util() ->
> >> cpu_util_cfs() operates on an old  cfs_rq->avg.util_est.enqueued value?
> >
> > well, because of this, when the p dequeued, _task_util_est(p) should be
> > subtracted before cfs_rq_util_change().
> > however, the original util_est_dequeue() dequeue the util_est and update
> > the
> > p->se.avg.util_est together.
> > so I separate the original util_est_dequeue() to deal with the issue.
>
> OK, I see.
>
> I ran a testcase '50% periodic task 'task0-0' (8ms/16ms)' with
> PELT + proprietary trace events within dequeue_task_fair() call:
>
> task0-0-1710 [002] 218.215535: sched_pelt_se:      cpu=2 path=(null) comm=task0-0 pid=1710 load=596 runnable=597 util=597 update_time=218123022336
> task0-0-1710 [002] 218.215536: sched_pelt_cfs:     cpu=2 path=/ load=597 runnable=597 util=597 update_time=218123022336
> task0-0-1710 [002] 218.215538: bprint:             sugov_get_util: CPU2 rq->cfs.avg.util_avg=597 rq->cfs.avg.util_est.enqueued=601
> task0-0-1710 [002] 218.215540: sched_util_est_cfs: cpu=2 path=/ enqueued=0 ewma=0 util=597
> task0-0-1710 [002] 218.215542: bprint:             dequeue_task_fair: CPU2 [task0-0 1710] rq->cfs.avg.util_avg=[576->597] rq->cfs.avg.util_est.enqueued=[601->0]
>
> It's true that 'sugov_get_util() -> cpu_util_cfs()' can use
> rq->cfs.avg.util_est.enqueued before _task_util_est(p) is subtracted
> from it.
>
> But isn't rq->cfs.avg.util_est.enqueued (in this case 601) always close
> to rq->cfs.avg.util_avg (597) since the task was just running?
> The cfs_rq utilization contains a blocked (sleeping) task.

There will be a difference if the task alternates long and short runs
in which case util_avg is lower than util_est. But even in this case,
the freq will be update at next enqueue/dequeue/tick.
The only real case could be when cpu goes idle in shallow state (WFI)
which is impacted by the freq/voltage. But even in this case, the
situation should not be long

That being said, I agree that the value used by schedutil is not
correct at dequeue

>
> If I would run with your patch cpu_util_cfs() would chose between 597 and 0
> whereas without it does between 597 and 601.
>
> Do you have a specific use case in mind? Or even test results showing a benefit
> of your patch?
>
> > Dietmar Eggemann <dietmar.eggemann@....com> 于2020年12月11日周五 下午7:30写道：
> >
> >> Hi Yan,
> >>
> >> On 09/12/2020 11:44, Xuewen Yan wrote:
> >>> when a task dequeued, it will update it's util, and cfs_rq_util_change
> >>> would check rq's util, if the cfs_rq->avg.util_est.enqueued is bigger
> >>> than  cfs_rq->avg.util_avg, but because the cfs_rq->avg.util_est.enqueued
> >>> didn't be decreased, this would cause bigger cfs_rq_util by mistake,
> >>> as a result, cfs_rq_util_change may change freq unreasonablely.
> >>>
> >>> separate the util_est_dequeue() into util_est_dequeue() and
> >>> util_est_update(), and dequeue the _task_util_est(p) before update util.
> >>
> >> The issue is that sugov_update_[shared\|single] -> sugov_get_util() ->
> >> cpu_util_cfs() operates on an old  cfs_rq->avg.util_est.enqueued value?
> >>
> >> cpu_util_cfs()
> >>
> >>     if (sched_feat(UTIL_EST))
> >>         util = max_t(util, READ_ONCE(rq->cfs.avg.util_est.enqueued))
> >>                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >>
> >> dequeue_task_fair() (w/ your patch, moving (1) before (2))
> >>
> >>     /* (1) update cfs_rq->avg.util_est.enqueued */
> >>     util_est_dequeue()
> >>
> >>     /* (2) potential p->se.avg.util_avg update */
> >>     /* 2 for loops */
> >>     for_each_sched_entity()
> >>
> >>         /* this can only lead to a freq change for a root cfs_rq */
> >>         (dequeue_entity() ->) update_load_avg() -> cfs_rq_util_change()
> >>          -> cpufreq_update_util() ->...-> sugov_update_[shared\|single]
> >>
> >>     /* (3) potential update p->se.avg.util_est */
> >>     util_est_update()
> >>
> >>
> >> We do need (3) after (2) because of:
> >>
> >> util_est_update()
> >>     ...
> >>     ue.enqueued = (task_util(p) | UTIL_AVG_UNCHANGED); task_util
> >>     ...           ^^^^^^^^^^^^^
> >>                   p->se.avg.util_avg
> >>
> >>
> >> Did I get this right?
> >>
> >> [...]