[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aFFdseGAqImLtVCH@jlelli-thinkpadt14gen4.remote.csb>
Date: Tue, 17 Jun 2025 14:21:05 +0200
From: Juri Lelli <juri.lelli@...hat.com>
To: Marcel Ziswiler <marcel.ziswiler@...ethink.co.uk>
Cc: luca abeni <luca.abeni@...tannapisa.it>, linux-kernel@...r.kernel.org,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Vineeth Pillai <vineeth@...byteword.org>
Subject: Re: SCHED_DEADLINE tasks missing their deadline with
SCHED_FLAG_RECLAIM jobs in the mix (using GRUB)
On 02/06/25 16:59, Marcel Ziswiler wrote:
> Hi Juri
>
> On Thu, 2025-05-29 at 11:39 +0200, Juri Lelli wrote:
...
> > It should help us to better understand your setup and possibly reproduce
> > the problem you are seeing.
OK, it definitely took a while (apologies), but I think I managed to
reproduce the issue you are seeing.
I added SCHED_FLAG_RECLAIM support to rt-app [1], so it's easier for me
to play with the taskset and got to the following two situations when
running your coreX taskset on CPU1 of my system (since the issue is
already reproducible, I think it's OK to ignore the other tasksets as
they are running isolated on different CPUs anyway).
This is your coreX taskset, in which the last task is the bad behaving one that
will run without/with RECLAIM in the test.
|sched_deadline = sched_period | sched_runtime | CP max run time 90% of sched_runtime | utilisation | reclaim |
| -- | -- | -- | -- | -- |
| 5 ms | 0.15 ms | 0.135 ms | 3.00% | no |
| 10 ms | 1.8 ms | 1.62 ms | 18.00% | no |
| 10 ms | 2.1 ms | 1.89 ms | 21.00% | no |
| 14 ms | 2.3 ms | 2.07 ms | 16.43% | no |
| 50 ms | 8.0 ms | 7.20 ms | 16:00% | no |
| 10 ms | 0.5 ms | **1 | 5.00% | no |
Without reclaim everything looks good (apart from the 1st tasks that I
think suffers a bit from the granularity/precision of rt-app runtime
loop):
https://github.com/jlelli/misc/blob/main/deadline-no-reclaim.png
Order is the same as above, last tasks gets constantly throttled and
makes no harm to the rest.
With reclaim (only last misbehaving task) we indeed seem to have a problem:
https://github.com/jlelli/misc/blob/main/deadline-reclaim.png
Essentially all other tasks are experiencing long wakeup delays that
cause deadline misses. The bad behaving task seems to be able to almost
monopolize the CPU. Interesting to notice that, even if I left max
available bandwidth to 95%, the CPU is busy at 100%.
So, yeah, Luca, I think we have a problem. :-)
Will try to find more time soon and keep looking into this.
Thanks,
Juri
1 - https://github.com/jlelli/rt-app/tree/reclaim
Powered by blists - more mailing lists