linux-kernel - Re: [PATCH v5 05/10] cpufreq/schedutil: get max utilization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtCeoimTehLktVk4WiTEM5h9KW_4yvuxGof8p6ZOgsRE2Q@mail.gmail.com>
Date:   Thu, 31 May 2018 15:02:04 +0200
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Patrick Bellasi <patrick.bellasi@....com>
Cc:     Juri Lelli <juri.lelli@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Morten Rasmussen <Morten.Rasmussen@....com>,
        viresh kumar <viresh.kumar@...aro.org>,
        Valentin Schneider <valentin.schneider@....com>,
        Quentin Perret <quentin.perret@....com>,
        Luca Abeni <luca.abeni@...tannapisa.it>,
        Claudio Scordino <claudio@...dence.eu.com>,
        Joel Fernandes <joelaf@...gle.com>,
        Alessio Balsini <alessio.balsini@...tannapisa.it>
Subject: Re: [PATCH v5 05/10] cpufreq/schedutil: get max utilization

On 31 May 2018 at 12:27, Patrick Bellasi <patrick.bellasi@....com> wrote:
>
> Hi Vincent, Juri,
>
> On 28-May 18:34, Vincent Guittot wrote:
>> On 28 May 2018 at 17:22, Juri Lelli <juri.lelli@...hat.com> wrote:
>> > On 28/05/18 16:57, Vincent Guittot wrote:
>> >> Hi Juri,
>> >>
>> >> On 28 May 2018 at 12:12, Juri Lelli <juri.lelli@...hat.com> wrote:
>> >> > Hi Vincent,
>> >> >
>> >> > On 25/05/18 15:12, Vincent Guittot wrote:
>> >> >> Now that we have both the dl class bandwidth requirement and the dl class
>> >> >> utilization, we can use the max of the 2 values when agregating the
>> >> >> utilization of the CPU.
>> >> >>
>> >> >> Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
>> >> >> ---
>> >> >>  kernel/sched/sched.h | 6 +++++-
>> >> >>  1 file changed, 5 insertions(+), 1 deletion(-)
>> >> >>
>> >> >> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>> >> >> index 4526ba6..0eb07a8 100644
>> >> >> --- a/kernel/sched/sched.h
>> >> >> +++ b/kernel/sched/sched.h
>> >> >> @@ -2194,7 +2194,11 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {}
>> >> >>  #ifdef CONFIG_CPU_FREQ_GOV_SCHEDUTIL
>> >> >>  static inline unsigned long cpu_util_dl(struct rq *rq)
>> >> >>  {
>> >> >> -     return (rq->dl.running_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT;
>> >> >> +     unsigned long util = (rq->dl.running_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT;
>> >> >
>> >> > I'd be tempted to say the we actually want to cap to this one above
>> >> > instead of using the max (as you are proposing below) or the
>> >> > (theoretical) power reduction benefits of using DEADLINE for certain
>> >> > tasks might vanish.
>> >>
>> >> The problem that I'm facing is that the sched_entity bandwidth is
>> >> removed after the 0-lag time and the rq->dl.running_bw goes back to
>> >> zero but if the DL task has preempted a CFS task, the utilization of
>> >> the CFS task will be lower than reality and schedutil will set a lower
>> >> OPP whereas the CPU is always running.
>
> With UTIL_EST enabled I don't expect an OPP reduction below the
> expected utilization of a CFS task.

I'm not sure to fully catch what you mean but all tests that I ran,
are using util_est (which is enable by default  if i'm not wrong). So
all OPP drops that have been observed, were with util_est

>
> IOW, when a periodic CFS task is preempted by a DL one, what we use
> for OPP selection once the DL task is over is still the estimated
> utilization for the CFS task itself. Thus, schedutil will eventually
> (since we have quite conservative down scaling thresholds) go down to
> the right OPP to serve that task.
>
>> >> The example with a RT task described in the cover letter can be
>> >> run with a DL task and will give similar results.
>
> In the cover letter you says:
>
>    A rt-app use case which creates an always running cfs thread and a
>    rt threads that wakes up periodically with both threads pinned on
>    same CPU, show lot of frequency switches of the CPU whereas the CPU
>    never goes idles during the test.
>
> I would say that's a quite specific corner case where your always
> running CFS task has never accumulated a util_est sample.
>
> Do we really have these cases in real systems?

My example is voluntary an extreme one because it's easier to
highlight the problem

>
> Otherwise, it seems to me that we are trying to solve quite specific
> corner cases by adding a not negligible level of "complexity".

By complexity, do you mean :

Taking into account the number cfs running task to choose between
rq->dl.running_bw and avg_dl.util_avg

I'm preparing a patchset that will provide the cfs waiting time in
addition to dl/rt util_avg for almost no additional cost. I will try
to sent the proposal later today

>
> Moreover, I also have the impression that we can fix these
> use-cases by:
>
>   - improving the way we accumulate samples in util_est
>     i.e. by discarding preemption time
>
>   - maybe by improving the utilization aggregation in schedutil to
>     better understand DL requirements
>     i.e. a 10% utilization with a 100ms running time is way different
>     then the same utilization with a 1ms running time
>
>
> --
> #include <best/regards.h>
>
> Patrick Bellasi