[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b5e0dbb8-3aab-4316-85bb-6b7ac3134e07@redhat.com>
Date: Fri, 5 Apr 2024 16:35:49 +0200
From: Daniel Bristot de Oliveira <bristot@...hat.com>
To: Joel Fernandes <joel@...lfernandes.org>,
Daniel Bristot de Oliveira <bristot@...nel.org>,
Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
linux-kernel@...r.kernel.org, Luca Abeni <luca.abeni@...tannapisa.it>,
Tommaso Cucinotta <tommaso.cucinotta@...tannapisa.it>,
Thomas Gleixner <tglx@...utronix.de>,
Vineeth Pillai <vineeth@...byteword.org>,
Shuah Khan <skhan@...uxfoundation.org>, Phil Auld <pauld@...hat.com>,
David Vernet <void@...ifault.com>
Subject: Re: [PATCH v5 6/7] sched/deadline: Deferrable dl server
>> There will always be a person reading these emails and echoing the wrong things...
>> using 0-lax/0-laxity term here is a lose-lose.
>
> Agreed, so why not update your patch changelog to correct that (or post a new
> revision)?
That v6 is the repo was a partial update, that I sent there to sync our work. The v6 that I will
send already removed that.
>>> So, Vineeth and me came up with a patch below to "max cap" the DL server 0-lax
>>> time (max cap is default off keeping the regular behavior). This is needed to
>>> guarantee bandwidth for periodic CFS runners/sleepers.
>>
>> Another point... "guarantee bandwidth"... the bandwith is provided under certain conditions.
>> If the conditions are not respected, the guarantee a dl reservation will provide is that
>> the task will not put a utilization higher than the one requested, so yes, a dl reservation
>> can and will enforce a lower utilization if the task does not respect the conditions.
>> Also, if the reservation is ready, but no task is ready...
>
> Please clarify what conditions you mean? The conditions I am looking for are
> those given by RT throttling (see below). i.e., if RT takes up all of the CPU in
> a certain amount of time, then there is a certain amount reserved for CFS.
>
>>> The example usecase is:
>>>
>>> Consider DL server params 25ms / 50ms.
>>>
>>> Consider CFS task with duty cycle of 25ms / 76ms (run 25ms sleep 51ms).
>>
>> define duty... like, runtime 25, period 76? sleeps for 51 relative to a starting time
>> or not?
>
> There is no starting time. The CFS task in the quoted example only does run +
> sleep. Run for 25ms and sleep for 51ms, then run again for 25ms, etc.
>
>>
>> there are some holes in your explanation, it is tricky to reply inline for these cases...
>> I am continuing but....
>>
>>>
>>> run 25ms run 25ms
>>> _______ _______
>>> | | sleep 51 | | sleep 51
>>> -|------|-------|---------|---------|-------|----------|--------|------> t
>>> 0 25 50 101 126 151 202 227
>>> \ 0-lax / \ 0-lax /
>
> So before going into the discussion below, I just want to mention that if the
> DL-server cannot provide the same bandwidth that the RT throttling provided,
> then its broken by definition. And the breakage comes specifically because of
> this patch and nothing else. There are many breakages with this patch and I will
> go over some of them with unit tests below. Basically, in my view -- if a test
> case shows it works with RT throttling, but not with the DL-server, then its
> broken. Period. And most of those functional "breakages" come about because of
> this patch (6/7) and not the initial series actually.
>
> Here are some cases to shed some light:
>
> Case 1. Consider a CFS task with runtime 15ms and period 50ms. With the
> parameters set to 25ms runtime and 50ms period.
>
> The test fails with DL server (because of 6/7), and passes with RT throttling.
> See results below. For this test's code, see: https://shorturl.at/rwW07
A reproducer always helps. So, your task there is not a periodic task... it is
a sporadic task because it sleeps for a fixed amount of time after the runtime.
A periodic task with period 76 would wake at 0, 76, 152 - like cyclictest...
so consuming at a fixed time rate if the scheduler allows it.
In the case of a fixed sleep time at the end of the execution, it will end up
"throwing away bandwidth" if the runtime is not given at the beginning of the
period because it will run slower... accumulating error. But that was not the
main point here...
The problem here was more like: if a fair task goes to sleep in the middle of
the server activation (for a lock?), and then wakes up again, the code in v5 is
forcing it to defer... again. Thus, it is getting less bandwidth... notice that
it does not even need to be at the start of the period. It is the middle of the
execution.
Intuitively, reducing the deferred time would help there. But the best thing to do is:
If the fair task waited for the defer, and the real-time tasks are still using all
CPU time, do not defer the activation again, and keep the defer mechanism disabled
until the real-time tasks allow the fair scheduler to run in the background. So,
making the defer mode equivalent to the non-defer mode until the RT tasks start
to behave again.
For that, in the v6, there is a variable (dl_defer_running), once the dl_server
is enqueued after the defer time, the variable dl_defer_running is set.
If the fair task sleeps in the middle of the period, that variable do not change.
If the fair task wakes up and the dl_defer_running is still set, do not defer.
Keep running until you consume the reservation.
The variable dl_defer_running is set to 0 only after the fair tasks consume
its runtime without being in a dl_server... IOW, when the RT tasks start to
behave.
No interface change.
With that in place, your reproducers are working. I have a periodic version
of your reproducer, also improving how the task consumes the runtime,.. I
will send it to you so you can have a look.
>
> Specifically, it breaks because of this patch (6/7). If you revert the patch,
> the issue goes away.
>
> With the patch 6/7:
> # ./dlserver_test
> # Runtime of PID 85 is 0.430000 seconds
> Bail out! Runtime of PID 85 is not within 10% of expected runtime 0.900000
>
> Without the patch 6/7:
> # ./dlserver_test
> # Runtime of PID 87 is 0.900000 seconds
> ok 1 PASS
>
> So basically, because of defer (or whatever you want to call it ;)), it gives
> less than 50% of the bandwidth that it gave without the defer.
There was a problem with the non-defer mode as well, the dl_server_start() was
missing a set need resched. Fixed that in v6.
> Further, my impression is this patch (6/7) does not even solve all the issue it
> intended. For example, consider that a CFS task is in the boosted phase, and now
> an RT task wakes up. That RT task *will wait* for possibly the whole runtime
> granted to CFS, so it might not always help. Contrasting that with RT
> throttling, if an RT task is very well behaved (well behaved defined as not
> running to the limit that RT throttling should kick in), and it wakes up, it
> will run right away without any wait time, regardless of what CFS was or was not
> doing.
I fixed that as well.
The problem happens when a DL server has a large runtime (>=~ 50%).
Let's say 25 ms runtime, 50 ms period.
At time 0, the defer timer will be set at 25 ms (50 - 25).
>From 0 to 25, the RT task would consume, for instance, only 2 ms... so
it is behaving...
At time 25, the defer timer fires... and as the fair task ran for 23 ms
(25 - 2 ms taken by RT) it still has 2 ms runtime to run... so the server
is activated... it is not correct.
The change I made in v6 is:
Same case...
At time 25, the defer timer fires...
Then, the timer will re-compute the defer time:
If the RT tasks are behaving, forward the timer for the
new (deadline - runtime).
return;
For instance, in the previous case, the new defer timer would be: 50 ms - 2 ms.
CFS will continue working, consuming runtime and resetting the period to avoid
activating the dl server.
The idea of forwarding the timer was taken from the cfs period timer. It is also
possible to forward the timer on other points... if necessary...
I did more testing, with different task sets, including tasks that goes to sleep...
it is working as expected.
-- Daniel
> thanks,
>
> - Joel
>
Powered by blists - more mailing lists