linux-kernel - Re: [PATCH v4 6/7] sched/deadline: Deferrable dl server

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0ce80c5d-2433-13d5-33df-d110cf8faa9c@redhat.com>
Date:   Wed, 6 Sep 2023 16:58:11 +0200
From:   Daniel Bristot de Oliveira <bristot@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Daniel Bristot de Oliveira <bristot@...nel.org>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Valentin Schneider <vschneid@...hat.com>,
        linux-kernel@...r.kernel.org,
        Luca Abeni <luca.abeni@...tannapisa.it>,
        Tommaso Cucinotta <tommaso.cucinotta@...tannapisa.it>,
        Thomas Gleixner <tglx@...utronix.de>,
        Joel Fernandes <joel@...lfernandes.org>,
        Vineeth Pillai <vineeth@...byteword.org>,
        Shuah Khan <skhan@...uxfoundation.org>,
        Phil Auld <pauld@...hat.com>
Subject: Re: [PATCH v4 6/7] sched/deadline: Deferrable dl server

On 9/6/23 10:29, Peter Zijlstra wrote:
> On Tue, Sep 05, 2023 at 05:24:40PM +0200, Daniel Bristot de Oliveira wrote:
>> On 9/5/23 15:42, Peter Zijlstra wrote:
>>> On Thu, Aug 31, 2023 at 10:28:57PM +0200, Daniel Bristot de Oliveira wrote:
>>>> +void dl_server_start(struct sched_dl_entity *dl_se, int defer)
>>>>  {
>>>> +	if (dl_se->server_state != DL_SERVER_STOPPED) {
>>>> +		WARN_ON_ONCE(!(on_dl_rq(dl_se) || dl_se->dl_throttled));
>>>> +		return;
>>>> +	}
>>>> +
>>>> +	if (defer) {
>>>> +		/*
>>>> +		 * Postpone the replenishment to the (next period - the execution time)
> 
> perhaps explicitly mention zero-laxity here, then we all know what is
> meant, no?

Last time I used that word it caused more problems than helped :-) But I will
add it and specify that is "for this task".

>>>> +		 *
>>>> +		 * With this in place, we have two cases:
>>>> +		 *
>>>> +		 * On the absence of DL tasks:
>>>> +		 *	The server will start at the replenishment time, getting
>>>> +		 *	its runtime before now + period. This is the expected
>>>> +		 *	throttling behavior.
>>>> +		 *
>>>> +		 * In the presense of DL tasks:
>>>> +		 *	The server will be replenished, and then it will be
>>>> +		 *	schedule according to EDF, not breaking SCHED_DEADLINE.
>>>> +		 *
>>>> +		 *	In the first cycle the server will be postponed at most
>>>> +		 *	at period + period - runtime at most. But then the
>>>> +		 *	server will receive its runtime/period.
>>>> +		 *
>>>> +		 *	The server will, however, run on top of any RT task, which
>>>> +		 *	is the expected throttling behavior.
>>>> +		 */
>>>> +		dl_se->deadline = rq_clock(dl_se->rq) + dl_se->dl_period - dl_se->dl_runtime;
>>>
>>> I was confused by this, but this is an instance of
>>> replenish_dl_new_period(), where we explicitly do not preserve the
>>> period.
>>
>> yeah, it is two operations in one (so not straight forward). It sets
>> the period as the now - runtime, so..
> 
> Right. I'll just attribute it to me being generally confused about
> everything after holidays ;-)
> 
> The thing that tripped me was that the use of rq_clock() breaks
> periodicy, except of course you want/need that when you start a new
> activation.

that rq_clock() is only used at the time in which we start the deferred
server. If the throttling condition stays one, the server will act as a
regular periodic DL task...

> 
>>>> +		/* Zero the runtime */
>>>> +		dl_se->runtime = 0;
>>>> +		/* throttle the server */
>>>> +		dl_se->dl_throttled = 1;
>>
>> as the server is throttled, it will set the replenishing timer for now - runtime + period because
>> of the deadline.
> 
> Yeah, it's a wee hack to move it to the zero-laxity point. I was
> considering if it makes sense to push that down and make it available
> for all DL tasks, but I'm not sure..

It might be useful in the future, like when DL dominates all other schedulers, so
we can have a way to schedule a deferred work, like kworkers... :-) But it might be
too early for that..

>>> I'm thinking this is wrong -- ISTR there definite benefits to explicit
>>> slack time scheduling, meaning the server should also run while DL tasks
>>> are on.
>>
>> I can add the check to enable it also in the presence of DL tasks, we just need
>> to have consider that the dl server is a dl task as well when disabling.
> 
> Why not keep what was there, always run the thing when there's FAIR
> tasks around? Or do you see severe overhead from running it with only
> FAIR tasks on?

Assuming that most of the cases people only have fair tasks, which is probably
true in the regular kernel... (like, no threaded IRQs).

>> It is a benefit because it will fix the case in which a CPU full of DL tasks
>> (possible with global).
>>
>> Additionally, running the server when there are only fair tasks
>>> is mostly harmless, right? So why this change?
>>
>> Rhe problem is that we never know when a RT one will arrive :-/
>>
>> E.g., only fair tasks, server enabled. All fine :-) Then an RT arrives, and gets
>> postponed by the server... RT people complaining (for those thinking on adding
>> a server for RT, we will have a similar problem as we have with throttling today +
>> DL has no fixed priority).
> 
> Well, the thing is, the moment you run DL tasks on that machine those RT
> tasks will get delayed *anyway*. RT people need to stop pretending FIFO
> is the highest class.
> 
> But yeah, I can see why they might get upset if the time is then used to
> run FAIR tasks while their RT task gets to wait.

right

>> We will still face the same problem with defferable server, and the example is the same,
>> just with a shift in the RT arrival. e.g., only fair task for (server period - runtime),
>> then the server gets enabled, and a 1us after the RT arrives and gets postponed... again.
>>
>> So the need to enable it only after the dispatch of a kind of RT (or DL to be added)
>> tasks that can cause starvation.
>>
>> We could simplify it by enabling only on RT/DL arrival but... if there is nothing to
>> starve... it is not needed anyways so less overhead avoiding the enqueue...
> 
> So one thing we could do is have update_curr_fair() decrement
> fair_server's runtime and yield the period then it hits 0 (and capping
> it at 0, not allowing it to go negative or so).
> 
> That way you only force the situation when FAIR hasn't had it's allotted
> time this perio, and only for as much as to make up for the time it
> lacks.

We can also decrease the runtime to a negative number while in defer/throttle state, and let
the while in replenish_dl_entity() to replenish with the += runtime;

here:
--- %< ---
        /*
         * We keep moving the deadline away until we get some
         * available runtime for the entity. This ensures correct
         * handling of situations where the runtime overrun is
         * arbitrary large.
         */
        while (dl_se->runtime <= 0) {
                dl_se->deadline += pi_of(dl_se)->dl_period;
                dl_se->runtime += pi_of(dl_se)->dl_runtime;
        }

--- >% ---

it is already done... no?

> 
>>>
>>> One of the benefits was -- IIRC, that we no longer need to subtract some
>>> random margin from the reservation limit, but there were more I think.
>>>
>>
>> We can simplify things because we can start ignoring the rt throttling limit when
>> there is no need to throttle... like on grub (but, only after we remove the rt
>> throttling).
>>
>> Thoughts?
> 
> The moment the fair server is there (these patches) you should also kill
> the throttling. It doesn't make sense to have both.
>