linux-kernel - SCHED_DEADLINE tasks missing their deadline with SCHED_FLAG

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <ce8469c4fb2f3e2ada74add22cce4bfe61fd5bab.camel@codethink.co.uk>
Date: Mon, 28 Apr 2025 20:04:09 +0200
From: Marcel Ziswiler <marcel.ziswiler@...ethink.co.uk>
To: linux-kernel@...r.kernel.org
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, 
 Juri Lelli <juri.lelli@...hat.com>, Vineeth Pillai
 <vineeth@...byteword.org>, Daniel Bristot de Oliveira <bristot@...nel.org>
Subject: SCHED_DEADLINE tasks missing their deadline with SCHED_FLAG_RECLAIM
 jobs in the mix (using GRUB)

Hi

As part of our trustable work [1], we also run a lot of real time scheduler (SCHED_DEADLINE) tests on the
mainline Linux kernel. Overall, the Linux scheduler proves quite capable of scheduling deadline tasks down to a
granularity of 5ms on both of our test systems (amd64-based Intel NUCs and aarch64-based RADXA ROCK5Bs).
However, recently, we noticed a lot of deadline misses if we introduce overrunning jobs with reclaim mode
enabled (SCHED_FLAG_RECLAIM) using GRUB (Greedy Reclamation of Unused Bandwidth). E.g. from hundreds of
millions of test runs over the course of a full week where we usually see absolutely zero deadline misses, we
see 43 million deadline misses on NUC and 600 thousand on ROCK5B (which also has double the CPU cores). This is
with otherwise exactly the same test configuration, which adds exactly the same two overrunning jobs to the job
mix, but once without reclaim enabled and once with reclaim enabled.

We are wondering whether there are any known limitations to GRUB or what exactly could be the issue.

We are happy to provide more detailed debugging information but are looking for suggestions how/what exactly to
look at.

Any help is much appreciated. Thanks!

Cheers

Marcel

[1] https://projects.eclipse.org/projects/technology.tsf