linux-kernel - Re: Usecases for the per-task latency-nice attribute

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <e39eccde-434a-dbc8-e093-554ce2cbfdc6@linux.ibm.com>
Date:   Thu, 19 Sep 2019 12:31:57 +0530
From:   Parth Shah <parth@...ux.ibm.com>
To:     Patrick Bellasi <patrick.bellasi@....com>
Cc:     linux-kernel@...r.kernel.org,
        Peter Zijlstra <peterz@...radead.org>,
        subhra mazumdar <subhra.mazumdar@...cle.com>,
        tim.c.chen@...ux.intel.com,
        Valentin Schneider <valentin.schneider@....com>,
        mingo@...hat.com, morten.rasmussen@....com,
        dietmar.eggemann@....com, pjt@...gle.com,
        vincent.guittot@...aro.org, quentin.perret@....com,
        dhaval.giani@...cle.com, daniel.lezcano@...aro.org, tj@...nel.org,
        rafael.j.wysocki@...el.com, qais.yousef@....com,
        Patrick Bellasi <patrick.bellasi@...bug.net>
Subject: Re: Usecases for the per-task latency-nice attribute



On 9/18/19 7:48 PM, Patrick Bellasi wrote:
> 
> On Wed, Sep 18, 2019 at 13:41:04 +0100, Parth Shah wrote...
> 
>> Hello everyone,
> 
> Hi Parth,
> thanks for staring this discussion.
> 
> [ + patrick.bellasi@...bug.net ] my new email address, since with
> @arm.com I will not be reachable anymore starting next week.
> 

Noted. I will send new version with the summary of all the discussion and
add more people to CC. Will change your mail in that, thanks for notifying me.

>> As per the discussion in LPC2019, new per-task property like latency-nice
>> can be useful in certain scenarios. The scheduler can take proper decision
>> by knowing latency requirement of a task from the end-user itself.
>>
>> There has already been an effort from Subhra for introducing Task
>> latency-nice [1] values and have seen several possibilities where this type of
>> interface can be used.
>>
>> From the best of my understanding of the discussion on the mail thread and
>> in the LPC2019, it seems that there are two dilemmas;
>>
>> 1. Name: What should be the name for such attr for all the possible usecases?
>> =============
>> Latency nice is the proposed name as of now where the lower value indicates
>> that the task doesn't care much for the latency
> 
> If by "lower value" you mean -19 (in the proposed [-20,19] range), then
> I think the meaning should be the opposite.
> 

Oops, my bad. i wanted to tell higher value but somehow missed that
latency-nice should be the opposite to the latency sensitivity.

But in the further scope of the discussion, I mean -19 to be the least
value (latency sensitive) and +20 to be the greatest value(does not care
for latency) if range is [-19,20]

> A -19 latency-nice task is a task which is not willing to give up
> latency. For those tasks for example we want to reduce the wake-up
> latency at maximum.
> 
> This will keep its semantic aligned to that of process niceness values
> which range from -20 (most favourable to the process) to 19 (least
> favourable to the process).

Totally agreed upon.

> 
>> and we can spend some more time in the kernel to decide a better
>> placement of a task (to save time, energy, etc.)
> 
> Tasks with an high latency-nice value (e.g. 19) are "less sensible to
> latency". These are tasks we wanna optimize mainly for throughput and
> thus, for example, we can spend some more time to find out a better task
> placement at wakeup time.
> 
> Does that makes sense?

Correct. Task placement is one way to optimize which can benefit to both
the server and embedded world by saving power without compromising much on
performance.

> 
>> But there seems to be a bit of confusion on whether we want biasing as well
>> (latency-biased) or something similar, in which case "latency-nice" may
>> confuse the end-user.
> 
> AFAIU PeterZ point was "just" that if we call it "-nice" it has to
> behave as "nice values" to avoid confusions to users. But, if we come up
> with a different naming maybe we will have more freedom.
> 
> Personally, I like both "latency-nice" or "latency-tolerant", where:
> 
>  - latency-nice:
>    should have a better understanding based on pre-existing concepts
> 
>  - latency-tolerant:
>    decouples a bit its meaning from the niceness thus giving maybe a bit
>    more freedom in its complete definition and perhaps avoid any
>    possible interpretation confusion like the one I commented above.
> 
> Fun fact: there was also the latency-nasty proposal from PaulMK :)
> 

Cool. In that sense, latency-tolerant seems to be more flexible covering
multiple functionality that a scheduler can provide with such userspace hints.


>> 2. Value: What should be the range of possible values supported by this new
>> attr?
>> ==============
>> The possible values of such task attribute still need community attention.
>> Do we need a range of values or just binary/ternary values are sufficient?
>> Also signed or unsigned and so the length of the variable (u64, s32,
>> etc)?
> 
> AFAIR, the proposal on the table are essentially two:
> 
>  A) use a [-20,19] range
> 
>     Which has similarities with the niceness concept and gives a minimal
>     continuous range. This can be on hand for things like scaling the
>     vruntime normalization [3]
> 
>  B) use some sort of "profile tagging"
>     e.g. background, latency-sensible, etc...
>     
>     If I correctly got what PaulT was proposing toward the end of the
>     discussion at LPC.
> 

If I got it right, then for option B, we can have this attr to be used as a
latency_flag just like per-process flags (e.g. PF_IDLE). If so, then we can
piggyback on the p->flags itself, hence I will prefer the range unless we
have multiple usecases which can not get best out of the range.

> This last option deserves better exploration.
> 
> At first glance I'm more for option A, I see a range as something that:
> 
>   - gives us a bit of flexibility in terms of the possible internal
>     usages of the actual value
> 
>   - better supports some kind of linear/proportional mapping
> 
>   - still supports a "profile tagging" by (possible) exposing to
>     user-space some kind of system wide knobs defining threshold that
>     maps the continuous value into a "profile"
>     e.g. latency-nice >= 15: use SCHED_BATCH
> 

+1, good listing to support range for latency-<whatever>

>     In the following discussion I'll call "threshold based profiling"
>     this approach.
> 
> 
>> This mail is to initiate the discussion regarding the possible usecases of
>> such per task attribute and to come up with a specific name and value for
>> the same.
>>
>> Hopefully, interested one should plot out their usecase for which this new
>> attr can potentially help in solving or optimizing it.
> 
> +1
> 
>> Well, to start with, here is my usecase.
>>
>> -------------------
>> **Usecases**
>> -------------------
>>
>> $> TurboSched
>> ====================
>> TurboSched [2] tries to minimize the number of active cores in a socket by
>> packing an un-important and low-utilization (named jitter) task on an
>              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> We should really come up with a different name, since jitters clashes
> with other RT related concepts.
> 

I agree, based on LPC discussion and comments from tglx, I am happy to
rename it to whatever feels functionally correct and non-confusing to end-user.

> Maybe we don't even need a name at all, the other two attributes you
> specify are good enough to identify those tasks: they are just "small
> background" tasks.
> 
>   small      : because on their small util_est value
>   background : because of their high latency-nice value
> 

Correct. If we have latency-nice hints + utilization then we can classify
those tasks for task packing.

>> already active core and thus refrains from waking up of a new core if
>> possible. This requires tagging of tasks from the userspace hinting which
>> tasks are un-important and thus waking-up a new core to minimize the
>> latency is un-necessary for such tasks.
>> As per the discussion on the posted RFC, it will be appropriate to use the
>> task latency property where a task with the highest latency-nice value can
>> be packed.
> 
> We should better defined here what you mean with "highest" latency-nice
> value, do you really mean the top of the range, e.g. 19?
> 

yes, I mean +19 (or +20 whichever is higher) here which does not care for
latency.

> Or...
> 
>> But for this specific use-cases, having just a binary value to know which
>> task is latency-sensitive and which not is sufficient enough, but having a
>> range is also a good way to go where above some threshold the task can be
>> packed.
> 
> ... yes, maybe we can reason about a "threshold based profiling" where
> something like for example:
> 
>    /proc/sys/kernel/sched_packing_util_max    : 200
>    /proc/sys/kernel/sched_packing_latency_min : 17
> 
> means that a task with latency-nice >= 17 and util_est <= 200 will be packed?
> 

yes, something like that.

> 
> $> Wakeup path tunings
> ==========================
> 
> Some additional possible use-cases was already discussed in [3]:
> 
>  1. dynamically tune the policy of a task among SCHED_{OTHER,BATCH,IDLE}
>    depending on crossing certain pre-configured threshold of latency
>    niceness.
>   
>  2. dynamically bias the vruntime updates we do in place_entity()
>    depending on the actual latency niceness of a task.
>   
>    PeterZ thinks this is dangerous but that we can "(carefully) fumble a
>    bit there."
>   
>  3. bias the decisions we take in check_preempt_tick() still depending
>    on a relative comparison of the current and wakeup task latency
>    niceness values.
> 

Nice. Thanks for listing out the usecases.

I guess latency_flags will be difficult to use for usecase 2 and 3, but
range will work for all the three usecases.

>> References:
>> ===========
>> [1]. https://lkml.org/lkml/2019/8/30/829
>> [2]. https://lkml.org/lkml/2019/7/25/296
> 
>   [3]. Message-ID: <20190905114709.GM2349@...ez.programming.kicks-ass.net>
>        https://lore.kernel.org/lkml/20190905114709.GM2349@hirez.programming.kicks-ass.net/
> 
> 
> Best,
> Patrick
> 

Thanks,
Parth