linux-kernel - Re: SCHED_DEADLINE tasks missing their deadline with SCHED_FLAG

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250507222549.183e0b4a@nowhere>
Date: Wed, 7 May 2025 22:25:49 +0200
From: luca abeni <luca.abeni@...tannapisa.it>
To: Marcel Ziswiler <marcel.ziswiler@...ethink.co.uk>
Cc: Juri Lelli <juri.lelli@...hat.com>, linux-kernel@...r.kernel.org, Ingo
 Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Vineeth
 Pillai <vineeth@...byteword.org>
Subject: Re: SCHED_DEADLINE tasks missing their deadline with
 SCHED_FLAG_RECLAIM jobs in the mix (using GRUB)

Hi Marcel,

just a quick question to better understand your setup (and check where
the issue comes from):
in the email below, you say that tasks are statically assigned to
cores; how did you do this? Did you use isolated cpusets, or did you
set the tasks affinities after disabling the SCHED_DEADLINE admission
control (echo -1 > /proc/sys/kernel/sched_rt_runtime_us)?

Or am I misunderstanding your setup?

Also, are you using HRTICK_DL?


			Thanks,
				Luca

On Sat, 03 May 2025 13:14:53 +0200
Marcel Ziswiler <marcel.ziswiler@...ethink.co.uk> wrote:
[...]
> We currently use three cores as follows:
> 
> #### core x
> 
> |sched_deadline = sched_period | sched_runtime | CP max run time 90%
> of sched_runtime | utilisation | reclaim | | -- | -- | -- | -- | -- |
> |  5 ms  | 0.15 ms | 0.135 ms |  3.00% | no |
> | 10 ms  | 1.8 ms  | 1.62 ms  | 18.00% | no |
> | 10 ms  | 2.1 ms  | 1.89 ms  | 21.00% | no |
> | 14 ms  | 2.3 ms  | 2.07 ms  | 16.43% | no |
> | 50 ms  | 8.0 ms  | 7.20 ms  | 16:00% | no |
> | 10 ms  | 0.5 ms  | **1      |  5.00% | no |
> 
> Total utilisation of core x is 79.43% (less than 100%)
> 
> **1 - this shall be a rogue process. This process will
>  a) run for the maximum allowed workload value 
>  b) do not collect execution data
> 
> This last rogue process is the one which causes massive issues to the
> rest of the scheduling if we set it to do reclaim.
> 
> #### core y
> 
> |sched_deadline = sched_period | sched_runtime | CP max run time 90%
> of sched_runtime | utilisation | reclaim | | -- | -- | -- | -- | -- |
> |  5 ms  | 0.5 ms | 0.45 ms | 10.00% | no |
> | 10 ms  | 1.9 ms | 1.71 ms | 19.00% | no |
> | 12 ms  | 1.8 ms | 1.62 ms | 15.00% | no |
> | 50 ms  | 5.5 ms | 4.95 ms | 11.00% | no |
> | 50 ms  | 9.0 ms | 8.10 ms | 18.00% | no |
> 
> Total utilisation of core y is 73.00% (less than 100%)
> 
> #### core z
> 
> The third core is special as it will run 50 jobs with the same
> configuration as such:
> 
> |sched_deadline = sched_period | sched_runtime | CP max run time 90%
> of sched_runtime | utilisation | | -- | -- | -- | -- |
> |  50 ms  | 0.8 ms | 0.72 ms | 1.60% |
> 
> jobs 1-50 should run with reclaim OFF
> 
> Total utilisation of core y is 1.6 * 50 = 80.00% (less than 100%)
> 
> Please let me know if you need any further details which may help
> figuring out what exactly is going on.
> 
> > Adding Luca in Cc so he can also take a look.
> > 
> > Thanks,  
> 
> Thank you!
> 
> > Juri  
> 
> Cheers
> 
> Marcel