linux-kernel - Re: [RFC PATCH 1/9] sched,cgroup: Add interface for latency-nice

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <242c8410-616c-51b2-7aad-4d92ac3a149f@linux.ibm.com>
Date:   Fri, 6 Sep 2019 18:01:19 +0530
From:   Parth Shah <parth@...ux.ibm.com>
To:     Patrick Bellasi <patrick.bellasi@....com>,
        Subhra Mazumdar <subhra.mazumdar@...cle.com>,
        Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org, mingo@...hat.com, tglx@...utronix.de,
        steven.sistare@...cle.com, dhaval.giani@...cle.com,
        daniel.lezcano@...aro.org, vincent.guittot@...aro.org,
        viresh.kumar@...aro.org, tim.c.chen@...ux.intel.com,
        mgorman@...hsingularity.net
Subject: Re: [RFC PATCH 1/9] sched,cgroup: Add interface for latency-nice



On 9/5/19 3:15 PM, Patrick Bellasi wrote:
> 
> On Thu, Sep 05, 2019 at 09:31:27 +0100, Peter Zijlstra wrote...
> 
>> On Fri, Aug 30, 2019 at 10:49:36AM -0700, subhra mazumdar wrote:
>>> Add Cgroup interface for latency-nice. Each CPU Cgroup adds a new file
>>> "latency-nice" which is shared by all the threads in that Cgroup.
>>
>> *sigh*, no. We start with a normal per task attribute, and then later,
>> if it is needed and makes sense, we add it to cgroups.
> 
> FWIW, to add on top of what Peter says, we used this same approach for
> uclamp and it proved to be a very effective way to come up with a good
> design. General principles have been:
> 
>  - a system wide API [1] (under /proc/sys/kernel/sched_*) defines
>    default values for all tasks affected by that feature.
>    This interface has to define also upper bounds for task specific
>    values. Thus, in the case of latency-nice, it should be set by
>    default to the MIN value, since that's the current mainline
>    behaviour: all tasks are latency sensitive.
> 
>  - a per-task API [2] (via the sched_setattr() syscall) can be used to
>    relax the system wide setting thus implementing a "nice" policy.
> 
>  - a per-taskgroup API [3] (via cpu controller's attributes) can be used
>    to relax the system-wide settings and restrict the per-task API.
> 
> The above features are worth to be added in that exact order.
> 
>> Also, your Changelog fails on pretty much every point. It doesn't
>> explain why, it doesn't describe anything and so on.
> 
> On the description side, I guess it's worth to mention somewhere to
> which scheduling classes this feature can be useful for. It's worth to
> mention that it can apply only to:
> 
>  - CFS tasks: for example, at wakeup time a task with an high
>    latency-nice should avoid to preempt a low latency-nice task.
>    Maybe by mapping the latency nice value into proper vruntime
>    normalization value?
> 

If I got this correct, does this also mean that a task's latency-nice
will be mapped to prio/nice.
i.e, task with min-latency-nice will have highest priority?

>  - RT tasks: for example, at wakeup time a task with an high
>    latency-nice value could avoid to preempt a CFS task.
> 

So, will this make CFS task to precede RT task?
and cause priority inversion?

> I'm sure there will be discussion about some of these features, that's
> why it's important in the proposal presentation to keep a well defined
> distinction among the "mechanisms and API" and how we use the new
> concept to "bias" some scheduler policies.
> 
>> From just reading the above, I would expect it to have the range
>> [-20,19] just like normal nice. Apparently this is not so.
> 
> Regarding the range for the latency-nice values, I guess we have two
> options:
> 
>   - [-20..19], which makes it similar to priorities
>   downside: we quite likely end up with a kernel space representation
>   which does not match the user-space one, e.g. look at
>   task_struct::prio.
> 
>   - [0..1024], which makes it more similar to a "percentage"
> 
> Being latency-nice a new concept, we are not constrained by POSIX and
> IMHO the [0..1024] scale is a better fit.
> 
> That will translate into:
> 
>   latency-nice=0 : default (current mainline) behaviour, all "biasing"
>   policies are disabled and we wakeup up as fast as possible
> 
>   latency-nice=1024 : maximum niceness, where for example we can imaging
>   to turn switch a CFS task to be SCHED_IDLE?
> 
> Best,
> Patrick
> 
> [1] commit e8f14172c6b1 ("sched/uclamp: Add system default clamps")
> [2] commit a509a7cd7974 ("sched/uclamp: Extend sched_setattr() to support utilization clamping")
> [3] 5 patches in today's tip/sched/core up to:
>     commit babbe170e053 ("sched/uclamp: Update CPU's refcount on TG's clamp changes")
>