linux-kernel - Re: [PATCH] cpufreq: schedutil: rate limits for SCHED

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45eb0110-f06e-c9c8-ad0b-16349976ffa3@evidence.eu.com>
Date:   Fri, 9 Feb 2018 14:20:32 +0100
From:   Claudio Scordino <claudio@...dence.eu.com>
To:     "Rafael J. Wysocki" <rafael@...nel.org>,
        Juri Lelli <juri.lelli@...hat.com>
Cc:     "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        "Rafael J . Wysocki" <rafael.j.wysocki@...el.com>,
        Patrick Bellasi <patrick.bellasi@....com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Morten Rasmussen <morten.rasmussen@....com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Todd Kjos <tkjos@...roid.com>,
        Joel Fernandes <joelaf@...gle.com>,
        Linux PM <linux-pm@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] cpufreq: schedutil: rate limits for SCHED_DEADLINE



Il 09/02/2018 13:56, Rafael J. Wysocki ha scritto:
> On Fri, Feb 9, 2018 at 1:52 PM, Juri Lelli <juri.lelli@...hat.com> wrote:
>> On 09/02/18 13:08, Rafael J. Wysocki wrote:
>>> On Fri, Feb 9, 2018 at 12:51 PM, Juri Lelli <juri.lelli@...hat.com> wrote:
>>>> On 09/02/18 12:37, Rafael J. Wysocki wrote:
>>>>> On Fri, Feb 9, 2018 at 12:26 PM, Juri Lelli <juri.lelli@...hat.com> wrote:
>>>>>> On 09/02/18 12:04, Rafael J. Wysocki wrote:
>>>>>>> On Fri, Feb 9, 2018 at 11:53 AM, Juri Lelli <juri.lelli@...hat.com> wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On 09/02/18 11:36, Rafael J. Wysocki wrote:
>>>>>>>>> On Friday, February 9, 2018 9:02:34 AM CET Claudio Scordino wrote:
>>>>>>>>>> Hi Viresh,
>>>>>>>>>>
>>>>>>>>>> Il 09/02/2018 04:51, Viresh Kumar ha scritto:
>>>>>>>>>>> On 08-02-18, 18:01, Claudio Scordino wrote:
>>>>>>>>>>>> When the SCHED_DEADLINE scheduling class increases the CPU utilization,
>>>>>>>>>>>> we should not wait for the rate limit, otherwise we may miss some deadline.
>>>>>>>>>>>>
>>>>>>>>>>>> Tests using rt-app on Exynos5422 have shown reductions of about 10% of deadline
>>>>>>>>>>>> misses for tasks with low RT periods.
>>>>>>>>>>>>
>>>>>>>>>>>> The patch applies on top of the one recently proposed by Peter to drop the
>>>>>>>>>>>> SCHED_CPUFREQ_* flags.
>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [cut]
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Is it possible to (somehow) check here if the DL tasks will miss
>>>>>>>>>>> deadline if we continue to run at current frequency? And only ignore
>>>>>>>>>>> rate-limit if that is the case ?
>>>>>>>>
>>>>>>>> Isn't it always the case? Utilization associated to DL tasks is given by
>>>>>>>> what the user said it's needed to meet a task deadlines (admission
>>>>>>>> control). If that task wakes up and we realize that adding its
>>>>>>>> utilization contribution is going to require a frequency change, we
>>>>>>>> should _theoretically_ always do it, or it will be too late. Now, user
>>>>>>>> might have asked for a bit more than what strictly required (this is
>>>>>>>> usually the case to compensate for discrepancies between theory and real
>>>>>>>> world, e.g.  hw transition limits), but I don't think there is a way to
>>>>>>>> know "how much". :/
>>>>>>>
>>>>>>> You are right.
>>>>>>>
>>>>>>> I'm somewhat concerned about "fast switch" cases when the rate limit
>>>>>>> is used to reduce overhead.
>>>>>>
>>>>>> Mmm, right. I'm thinking that in those cases we could leave rate limit
>>>>>> as is. The user should then be aware of it and consider it as proper
>>>>>> overhead when designing her/his system.
>>>>>>
>>>>>> But then, isn't it the same for "non fast switch" platforms? I mean,
>>>>>> even in the latter case we can't go faster than hw limits.. mmm, maybe
>>>>>> the difference is that in the former case we could go as fast as theory
>>>>>> would expect.. but we shouldn't. :)
>>>>>
>>>>> Well, in practical terms that means "no difference" IMO. :-)
>>>>>
>>>>> I can imagine that in some cases this approach may lead to better
>>>>> results than reducing the rate limit overall, but the general case I'm
>>>>> not sure about.
>>>>>
>>>>> I mean, if overriding the rate limit doesn't take place very often,
>>>>> then it really should make no difference overhead-wise.  Now, of
>>>>> course, how to define "not very often" is a good question as that
>>>>> leads to rate-limiting the overriding of the original rate limit and
>>>>> that scheme may continue indefinitely ...
>>>>
>>>> :)
>>>>
>>>> My impression is that rate limit helps a lot for CFS, where the "true"
>>>> utilization is not known in advance, and being too responsive might
>>>> actually be counterproductive.
>>>>
>>>> For DEADLINE (and RT, with differences) we should always respond as
>>>> quick as we can (and probably remember that a frequency transition was
>>>> requested if hw was already performing one, but that's another patch)
>>>> because, if we don't, a task belonging to a lower priority class might
>>>> induce deadline misses in highest priority activities. E.g., a CFS task
>>>> that happens to trigger a freq switch right before a DEADLINE task wakes
>>>> up and needs an higher frequency to meet its deadline: if we wait for
>>>> the rate limit of the CFS originated transition.. deadline miss!
>>>
>>> Fair enough, but if there's too much overhead as a result of this, you
>>> can't guarantee the deadlines to be met anyway.
>>
>> Indeed. I guess this only works if corner cases as the one above don't
>> happen too often.
> 
> Well, that's the point.
> 
> So there is a tradeoff: do we want to allow deadlines to be missed
> because of excessive overhead or do we want to allow deadlines to be
> missed because of the rate limit.

For a very few tasks, the tests have indeed shown that the approach pays off: we get a significant reduction of misses with a negligible increase of energy consumption.
I still need to check what happens for a high amount of tasks, trying to reproduce the  "ramp up" pattern (in which DL keeps increasing the utilization, ignoring the rate limit and adding overhead)

Thanks,

               Claudio