lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 15 Dec 2020 20:56:33 +0800
From:   Xuewen Yan <xuewen.yan94@...il.com>
To:     Vincent Guittot <vincent.guittot@...aro.org>
Cc:     Dietmar Eggemann <dietmar.eggemann@....com>,
        Patrick Bellasi <patrick.bellasi@....com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Benjamin Segall <bsegall@...gle.com>,
        Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Xuewen Yan <Xuewen.Yan@...soc.com>,
        Ryan Y <xuewyan@...mail.com>,
        Chunyan Zhang <zhang.lyra@...il.com>,
        王科 (Ke Wang) <Ke.Wang@...soc.com>
Subject: Re: [PATCH] fair/util_est: Separate util_est_dequeue() for cfs_rq_util_change

On Tue, Dec 15, 2020 at 5:39 PM Vincent Guittot
<vincent.guittot@...aro.org> wrote:
>
> On Mon, 14 Dec 2020 at 19:46, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
> >
> > On 11/12/2020 13:03, Ryan Y wrote:
> > > Hi Dietmar,
> > >
> > > Yes! That's exactly what I meant.
> > >
> > >> The issue is that sugov_update_[shared\|single] -> sugov_get_util() ->
> > >> cpu_util_cfs() operates on an old  cfs_rq->avg.util_est.enqueued value?
> > >
> > > well, because of this, when the p dequeued, _task_util_est(p) should be
> > > subtracted before cfs_rq_util_change().
> > > however, the original util_est_dequeue() dequeue the util_est and update
> > > the
> > > p->se.avg.util_est together.
> > > so I separate the original util_est_dequeue() to deal with the issue.
> >
> > OK, I see.
> >
> > I ran a testcase '50% periodic task 'task0-0' (8ms/16ms)' with
> > PELT + proprietary trace events within dequeue_task_fair() call:
> >
> > task0-0-1710 [002] 218.215535: sched_pelt_se:      cpu=2 path=(null) comm=task0-0 pid=1710 load=596 runnable=597 util=597 update_time=218123022336
> > task0-0-1710 [002] 218.215536: sched_pelt_cfs:     cpu=2 path=/ load=597 runnable=597 util=597 update_time=218123022336
> > task0-0-1710 [002] 218.215538: bprint:             sugov_get_util: CPU2 rq->cfs.avg.util_avg=597 rq->cfs.avg.util_est.enqueued=601
> > task0-0-1710 [002] 218.215540: sched_util_est_cfs: cpu=2 path=/ enqueued=0 ewma=0 util=597
> > task0-0-1710 [002] 218.215542: bprint:             dequeue_task_fair: CPU2 [task0-0 1710] rq->cfs.avg.util_avg=[576->597] rq->cfs.avg.util_est.enqueued=[601->0]
> >
> > It's true that 'sugov_get_util() -> cpu_util_cfs()' can use
> > rq->cfs.avg.util_est.enqueued before _task_util_est(p) is subtracted
> > from it.
> >
> > But isn't rq->cfs.avg.util_est.enqueued (in this case 601) always close
> > to rq->cfs.avg.util_avg (597) since the task was just running?
> > The cfs_rq utilization contains a blocked (sleeping) task.
>
> There will be a difference if the task alternates long and short runs
> in which case util_avg is lower than util_est. But even in this case,
> the freq will be update at next enqueue/dequeue/tick.
> The only real case could be when cpu goes idle in shallow state (WFI)
> which is impacted by the freq/voltage. But even in this case, the
> situation should not be long
>
> That being said, I agree that the value used by schedutil is not
> correct at dequeue
>

Hi Dietmar,

as Vincent said, like the following scenario:
               running                              sleep
running        sleep
|---------------------------------------|
  |--------|

                  ^^
at the ^^ time, the util_avg is lower than util_est.
we hope that the CPU frequency would be reduced as soon as possible after
the task is dequeued. But the issue affects the sensitivity of cpu
frequency reduce.
worse, since the time, if there is no task enqueue or other scenarios where the
load is updated, the cpu may always maintain a high frequency.

if keep the util_est_dequeue as it is, are there other concerns,
or this patch would affect other places of system?


> >
> > If I would run with your patch cpu_util_cfs() would chose between 597 and 0
> > whereas without it does between 597 and 601.
> >
> > Do you have a specific use case in mind? Or even test results showing a benefit
> > of your patch?
> >
> > > Dietmar Eggemann <dietmar.eggemann@....com> 于2020年12月11日周五 下午7:30写道:
> > >
> > >> Hi Yan,
> > >>
> > >> On 09/12/2020 11:44, Xuewen Yan wrote:
> > >>> when a task dequeued, it will update it's util, and cfs_rq_util_change
> > >>> would check rq's util, if the cfs_rq->avg.util_est.enqueued is bigger
> > >>> than  cfs_rq->avg.util_avg, but because the cfs_rq->avg.util_est.enqueued
> > >>> didn't be decreased, this would cause bigger cfs_rq_util by mistake,
> > >>> as a result, cfs_rq_util_change may change freq unreasonablely.
> > >>>
> > >>> separate the util_est_dequeue() into util_est_dequeue() and
> > >>> util_est_update(), and dequeue the _task_util_est(p) before update util.
> > >>
> > >> The issue is that sugov_update_[shared\|single] -> sugov_get_util() ->
> > >> cpu_util_cfs() operates on an old  cfs_rq->avg.util_est.enqueued value?
> > >>
> > >> cpu_util_cfs()
> > >>
> > >>     if (sched_feat(UTIL_EST))
> > >>         util = max_t(util, READ_ONCE(rq->cfs.avg.util_est.enqueued))
> > >>                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > >>
> > >> dequeue_task_fair() (w/ your patch, moving (1) before (2))
> > >>
> > >>     /* (1) update cfs_rq->avg.util_est.enqueued */
> > >>     util_est_dequeue()
> > >>
> > >>     /* (2) potential p->se.avg.util_avg update */
> > >>     /* 2 for loops */
> > >>     for_each_sched_entity()
> > >>
> > >>         /* this can only lead to a freq change for a root cfs_rq */
> > >>         (dequeue_entity() ->) update_load_avg() -> cfs_rq_util_change()
> > >>          -> cpufreq_update_util() ->...-> sugov_update_[shared\|single]
> > >>
> > >>     /* (3) potential update p->se.avg.util_est */
> > >>     util_est_update()
> > >>
> > >>
> > >> We do need (3) after (2) because of:
> > >>
> > >> util_est_update()
> > >>     ...
> > >>     ue.enqueued = (task_util(p) | UTIL_AVG_UNCHANGED); task_util
> > >>     ...           ^^^^^^^^^^^^^
> > >>                   p->se.avg.util_avg
> > >>
> > >>
> > >> Did I get this right?
> > >>
> > >> [...]

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ