linux-kernel - Re: [PATCH v5 05/10] cpufreq/schedutil: get max utilization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Thu, 31 May 2018 11:27:36 +0100
From:   Patrick Bellasi <patrick.bellasi@....com>
To:     Vincent Guittot <vincent.guittot@...aro.org>
Cc:     Juri Lelli <juri.lelli@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Morten Rasmussen <Morten.Rasmussen@....com>,
        viresh kumar <viresh.kumar@...aro.org>,
        Valentin Schneider <valentin.schneider@....com>,
        Quentin Perret <quentin.perret@....com>,
        Luca Abeni <luca.abeni@...tannapisa.it>,
        Claudio Scordino <claudio@...dence.eu.com>,
        Joel Fernandes <joelaf@...gle.com>,
        Alessio Balsini <alessio.balsini@...tannapisa.it>
Subject: Re: [PATCH v5 05/10] cpufreq/schedutil: get max utilization


Hi Vincent, Juri,

On 28-May 18:34, Vincent Guittot wrote:
> On 28 May 2018 at 17:22, Juri Lelli <juri.lelli@...hat.com> wrote:
> > On 28/05/18 16:57, Vincent Guittot wrote:
> >> Hi Juri,
> >>
> >> On 28 May 2018 at 12:12, Juri Lelli <juri.lelli@...hat.com> wrote:
> >> > Hi Vincent,
> >> >
> >> > On 25/05/18 15:12, Vincent Guittot wrote:
> >> >> Now that we have both the dl class bandwidth requirement and the dl class
> >> >> utilization, we can use the max of the 2 values when agregating the
> >> >> utilization of the CPU.
> >> >>
> >> >> Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
> >> >> ---
> >> >>  kernel/sched/sched.h | 6 +++++-
> >> >>  1 file changed, 5 insertions(+), 1 deletion(-)
> >> >>
> >> >> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> >> >> index 4526ba6..0eb07a8 100644
> >> >> --- a/kernel/sched/sched.h
> >> >> +++ b/kernel/sched/sched.h
> >> >> @@ -2194,7 +2194,11 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {}
> >> >>  #ifdef CONFIG_CPU_FREQ_GOV_SCHEDUTIL
> >> >>  static inline unsigned long cpu_util_dl(struct rq *rq)
> >> >>  {
> >> >> -     return (rq->dl.running_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT;
> >> >> +     unsigned long util = (rq->dl.running_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT;
> >> >
> >> > I'd be tempted to say the we actually want to cap to this one above
> >> > instead of using the max (as you are proposing below) or the
> >> > (theoretical) power reduction benefits of using DEADLINE for certain
> >> > tasks might vanish.
> >>
> >> The problem that I'm facing is that the sched_entity bandwidth is
> >> removed after the 0-lag time and the rq->dl.running_bw goes back to
> >> zero but if the DL task has preempted a CFS task, the utilization of
> >> the CFS task will be lower than reality and schedutil will set a lower
> >> OPP whereas the CPU is always running.

With UTIL_EST enabled I don't expect an OPP reduction below the
expected utilization of a CFS task.

IOW, when a periodic CFS task is preempted by a DL one, what we use
for OPP selection once the DL task is over is still the estimated
utilization for the CFS task itself. Thus, schedutil will eventually
(since we have quite conservative down scaling thresholds) go down to
the right OPP to serve that task.

> >> The example with a RT task described in the cover letter can be
> >> run with a DL task and will give similar results.

In the cover letter you says:

   A rt-app use case which creates an always running cfs thread and a
   rt threads that wakes up periodically with both threads pinned on
   same CPU, show lot of frequency switches of the CPU whereas the CPU
   never goes idles during the test.

I would say that's a quite specific corner case where your always
running CFS task has never accumulated a util_est sample.

Do we really have these cases in real systems?

Otherwise, it seems to me that we are trying to solve quite specific
corner cases by adding a not negligible level of "complexity".

Moreover, I also have the impression that we can fix these
use-cases by:

  - improving the way we accumulate samples in util_est
    i.e. by discarding preemption time

  - maybe by improving the utilization aggregation in schedutil to
    better understand DL requirements
    i.e. a 10% utilization with a 100ms running time is way different
    then the same utilization with a 1ms running time


-- 
#include <best/regards.h>

Patrick Bellasi