linux-kernel - Re: SCHED_DEADLINE tasks missing their deadline with SCHED_FLAG

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f532441d8b3cf35e7058305fd9cd3f2cbd3a9fac.camel@codethink.co.uk>
Date: Sat, 03 May 2025 13:14:53 +0200
From: Marcel Ziswiler <marcel.ziswiler@...ethink.co.uk>
To: Juri Lelli <juri.lelli@...hat.com>
Cc: linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>, Peter
 Zijlstra <peterz@...radead.org>, Vineeth Pillai <vineeth@...byteword.org>,
 Luca Abeni	 <luca.abeni@...tannapisa.it>
Subject: Re: SCHED_DEADLINE tasks missing their deadline with
 SCHED_FLAG_RECLAIM jobs in the mix (using GRUB)

Hi Juri

Thanks for getting back to me.

On Fri, 2025-05-02 at 15:55 +0200, Juri Lelli wrote:
> Hi Marcel,
> 
> On 28/04/25 20:04, Marcel Ziswiler wrote:
> > Hi
> > 
> > As part of our trustable work [1], we also run a lot of real time scheduler (SCHED_DEADLINE) tests on the
> > mainline Linux kernel. Overall, the Linux scheduler proves quite capable of scheduling deadline tasks down
> > to a
> > granularity of 5ms on both of our test systems (amd64-based Intel NUCs and aarch64-based RADXA ROCK5Bs).
> > However, recently, we noticed a lot of deadline misses if we introduce overrunning jobs with reclaim mode
> > enabled (SCHED_FLAG_RECLAIM) using GRUB (Greedy Reclamation of Unused Bandwidth). E.g. from hundreds of
> > millions of test runs over the course of a full week where we usually see absolutely zero deadline misses,
> > we
> > see 43 million deadline misses on NUC and 600 thousand on ROCK5B (which also has double the CPU cores).
> > This is
> > with otherwise exactly the same test configuration, which adds exactly the same two overrunning jobs to the
> > job
> > mix, but once without reclaim enabled and once with reclaim enabled.
> > 
> > We are wondering whether there are any known limitations to GRUB or what exactly could be the issue.
> > 
> > We are happy to provide more detailed debugging information but are looking for suggestions how/what
> > exactly to
> > look at.
> 
> Could you add details of the taskset you are working with? The number of
> tasks, their reservation parameters (runtime, period, deadline) and how
> much they are running (or trying to run) each time they wake up. Also
> which one is using GRUB and which one maybe is not.

We currently use three cores as follows:

#### core x

|sched_deadline = sched_period | sched_runtime | CP max run time 90% of sched_runtime | utilisation | reclaim |
| -- | -- | -- | -- | -- |
|  5 ms  | 0.15 ms | 0.135 ms |  3.00% | no |
| 10 ms  | 1.8 ms  | 1.62 ms  | 18.00% | no |
| 10 ms  | 2.1 ms  | 1.89 ms  | 21.00% | no |
| 14 ms  | 2.3 ms  | 2.07 ms  | 16.43% | no |
| 50 ms  | 8.0 ms  | 7.20 ms  | 16:00% | no |
| 10 ms  | 0.5 ms  | **1      |  5.00% | no |

Total utilisation of core x is 79.43% (less than 100%)

**1 - this shall be a rogue process. This process will
 a) run for the maximum allowed workload value 
 b) do not collect execution data

This last rogue process is the one which causes massive issues to the rest of the scheduling if we set it to do
reclaim.

#### core y

|sched_deadline = sched_period | sched_runtime | CP max run time 90% of sched_runtime | utilisation | reclaim |
| -- | -- | -- | -- | -- |
|  5 ms  | 0.5 ms | 0.45 ms | 10.00% | no |
| 10 ms  | 1.9 ms | 1.71 ms | 19.00% | no |
| 12 ms  | 1.8 ms | 1.62 ms | 15.00% | no |
| 50 ms  | 5.5 ms | 4.95 ms | 11.00% | no |
| 50 ms  | 9.0 ms | 8.10 ms | 18.00% | no |

Total utilisation of core y is 73.00% (less than 100%)

#### core z

The third core is special as it will run 50 jobs with the same configuration as such:

|sched_deadline = sched_period | sched_runtime | CP max run time 90% of sched_runtime | utilisation |
| -- | -- | -- | -- |
|  50 ms  | 0.8 ms | 0.72 ms | 1.60% |

jobs 1-50 should run with reclaim OFF

Total utilisation of core y is 1.6 * 50 = 80.00% (less than 100%)

Please let me know if you need any further details which may help figuring out what exactly is going on.

> Adding Luca in Cc so he can also take a look.
> 
> Thanks,

Thank you!

> Juri

Cheers

Marcel