lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ac735cc59bbdccd3e99f5fa9c779b3904d19f990.camel@codethink.co.uk>
Date: Fri, 12 Sep 2025 12:03:01 +0200
From: Marcel Ziswiler <marcel.ziswiler@...ethink.co.uk>
To: Juri Lelli <juri.lelli@...hat.com>
Cc: linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>, Peter
 Zijlstra <peterz@...radead.org>, Vineeth Pillai <vineeth@...byteword.org>,
 Luca Abeni	 <luca.abeni@...tannapisa.it>
Subject: Re: SCHED_DEADLINE tasks causing WARNING at kernel/sched/sched.h
 message

Hi Juri

Thanks for getting back to me and sorry for my late reply.

On Thu, 2025-09-04 at 11:22 +0200, Juri Lelli wrote:
> Hi Marcel,
> 
> On 02/09/25 18:49, Marcel Ziswiler wrote:
> > Hi
> > 
> > On Tue, 2025-09-02 at 16:06 +0200, Marcel Ziswiler wrote:
> > > As part of our trustable work [1], we also run a lot of real time scheduler (SCHED_DEADLINE) tests on the
> > > mainline Linux kernel (v6.16.2 in below reported case).
> > 
> > Looking through more logs from earlier test runs I found similar WARN_ONs dating back as early as v6.15.3.
> > So
> > it does not look like a "new" issue in that sense.
> > 
> > [snip]
> > 
> > Any help is much appreciated. Thanks!
> 
> What's the actual workload composition leading the warning. I noticed
> stress-ng in the report. Could you please share more details?

Yes, sure. It's actually the exact same workload as related to the regression I reported back in April [1].

We currently use three cores as follows:

#### core x

|sched_deadline = sched_period | sched_runtime | CP max run time 90% of sched_runtime | utilisation | reclaim |
| -- | -- | -- | -- | -- |
|  5 ms  | 0.15 ms | 0.135 ms |  3.00% | no |
| 10 ms  | 1.8 ms  | 1.62 ms  | 18.00% | no |
| 10 ms  | 2.1 ms  | 1.89 ms  | 21.00% | no |
| 14 ms  | 2.3 ms  | 2.07 ms  | 16.43% | no |
| 50 ms  | 8.0 ms  | 7.20 ms  | 16:00% | no |
| 10 ms  | 0.5 ms  | **1      |  5.00% | no |

Total utilisation of core x is 79.43% (less than 100%)

**1 - this shall be a rogue process. This process will
 a) run for the maximum allowed workload value 
 b) do not collect execution data

#### core y

|sched_deadline = sched_period | sched_runtime | CP max run time 90% of sched_runtime | utilisation | reclaim |
| -- | -- | -- | -- | -- |
|  5 ms  | 0.5 ms | 0.45 ms | 10.00% | no |
| 10 ms  | 1.9 ms | 1.71 ms | 19.00% | no |
| 12 ms  | 1.8 ms | 1.62 ms | 15.00% | no |
| 50 ms  | 5.5 ms | 4.95 ms | 11.00% | no |
| 50 ms  | 9.0 ms | 8.10 ms | 18.00% | no |

Total utilisation of core y is 73.00% (less than 100%)

#### core z

The third core is special as it will run 50 jobs with the same configuration as such:

|sched_deadline = sched_period | sched_runtime | CP max run time 90% of sched_runtime | utilisation |
| -- | -- | -- | -- |
|  50 ms  | 0.8 ms | 0.72 ms | 1.60% |

jobs 1-50 should run with reclaim OFF

Total utilisation of core y is 1.6 * 50 = 80.00% (less than 100%)

In addition to that main workload we also run further stressors like from the stress-ng suite. However, only on
the remaining cores and in a controlled nsjail/apparmor sandbox.

Please let me know if you need any further details which may help figuring out what exactly is going on.

> Thanks!
> Juri

Cheers

Marcel

[1] https://lore.kernel.org/all/f532441d8b3cf35e7058305fd9cd3f2cbd3a9fac.camel@codethink.co.uk

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ