linux-kernel - Re: [PATCH v7 2/2] sched/fair: update scale invariance of PELT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtAR7otTTwKYbg5OWbgrUYNKBNsUnOcMS9CfQtbYspvO5A@mail.gmail.com>
Date:   Wed, 28 Nov 2018 15:55:05 +0100
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Patrick Bellasi <patrick.bellasi@....com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Morten Rasmussen <Morten.Rasmussen@....com>,
        Paul Turner <pjt@...gle.com>, Ben Segall <bsegall@...gle.com>,
        Thara Gopinath <thara.gopinath@...aro.org>,
        pkondeti@...eaurora.org, Quentin Perret <quentin.perret@....com>,
        Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>
Subject: Re: [PATCH v7 2/2] sched/fair: update scale invariance of PELT

On Wed, 28 Nov 2018 at 15:40, Patrick Bellasi <patrick.bellasi@....com> wrote:
>
> On 28-Nov 14:33, Vincent Guittot wrote:
> > On Wed, 28 Nov 2018 at 12:53, Patrick Bellasi <patrick.bellasi@....com> wrote:
> > >
> > > On 28-Nov 11:02, Peter Zijlstra wrote:
> > > > On Wed, Nov 28, 2018 at 10:54:13AM +0100, Vincent Guittot wrote:
> > > >
> > > > > Is there anything else that I should do for these patches ?
> > > >
> > > > IIRC, Morten mention they break util_est; Patrick was going to explain.
> > >
> > > I guess the problem is that, once we cross the current capacity,
> > > strictly speaking util_avg does not represent anymore a utilization.
> > >
> > > With the new signal this could happen and we end up storing estimated
> > > utilization samples which will overestimate the task requirements.
> > >
> > > We will have a spike in estimated utilization at next wakeup, since we
> > > use MAX(util_avg@...ueue_time, ewma). Potentially we also inflate the EWMA in
> > > case we collect multiple samples above the current capacity.
> >
> > TBH I don't see how it's different from current implementation with a
> > task that was scheduled on big core and now wakes up on little core.
> > The util_est is overestimated as well.
>
> While running below the capacity of a CPU, either big or LITTLE, we
> can still measure the actual used bandwidth as long as we have idle
> time. If the task is then moved into a lower capacity core, I think
> it's still safe to assume that, likely, it would need more capacity.
>
> Why do you say it's the same ?

In the example of a task that runs 39ms in period of 80ms that we used
during previous version,
the utilization on the big core will reach 709 so will util_est too
When the task migrates on little core (512), util_est is higher than
current cpu capacity

>
> With your new signal instead, once we cross the current capacity,
> utilization is just not anymore utilization. Thus, IMHO it make sense
> avoid to accumulate a sample for what we call "estimated utilization".
>
> I would also say that, with the current implementation which caps
> utilization to the current capacity, we get better estimation in
> general. At least we can say with absolute precision:
>
>    "the task needs _at least_ that amount of capacity".
>
> Potentially we can also flag the task as being under-provisioned, in
> case there was not idle time, and _let a policy_ decide what to do
> with it and the granted information we have.
>
> While, with your new signal, once we are over the current capacity,
> the "utilization" is just a sort of "random" number at best useful to
> drive some conclusions about how long the task has been delayed.
>
> IOW, I fear that we are embedding a policy within a signal which is
> currently representing something very well defined: how much cpu
> bandwidth a task used. While, latency/under-provisioning policies
> perhaps should be better placed somewhere else.
>
> Perhaps I've missed it in some of the previous discussions:
> have we have considered/discussed this signal-vs-policy aspect ?
>
> --
> #include <best/regards.h>
>
> Patrick Bellasi