lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45e4dc7a-f261-46ec-8973-0fb8d1f7b0b9@redhat.com>
Date: Wed, 28 Jan 2026 09:50:36 +0000
From: Gabriele Monaco <gmonaco@...hat.com>
To: Andrea Righi <arighi@...dia.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
	Juri Lelli <juri.lelli@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>, Tejun Heo <tj@...nel.org>,
	Joel Fernandes <joelagnelf@...dia.com>,
	David Vernet <void@...ifault.com>,
	Changwoo Min <changwoo@...lia.com>, Daniel Hodges <hodgesd@...a.com>,
	sched-ext@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] sched/deadline: Reset dl_server execution state on
 stop

2026-01-27T18:55:02Z Andrea Righi <arighi@...dia.com>:
> Unfortunately checking only runtime <= 0 isn't enough for the sched_ext DL
> server case:
>
> # Runtime of EXT task (PID 2025) is 0.000000 seconds
> # Runtime of RT task (PID 2026) is 4.990000 seconds
> # EXT task got 0.00% of total runtime
> not ok 2 FAIL: EXT task got less than 4.00% of runtime
>
> With the unconditional reset the EXT task gets 5% of the bandwidth. I'll
> add some debugging to figure out exactly what is happening.

Thanks for testing it. That's quite strange..

I run your test on a kernel without ext server, as far as I understand, the test is kinda indirectly checking also the fair server and that does not fail, right?
At least that's what I get on an arm64 machine with 128 CPUs.

After letting the test continue on failure I get:

# # Runtime of FAIR task (PID 22503) is 0.240000 seconds
# # Runtime of RT task (PID 22504) is 4.750000 seconds
# # FAIR task got 4.81% of total runtime
# ok 1 PASS: FAIR task got more than 4.00% of runtime
# TAP version 13
# 1..1
# # Runtime of EXT task (PID 22511) is 0.020000 seconds
# # Runtime of RT task (PID 22512) is 4.970000 seconds
# # EXT task got 0.40% of total runtime
# not ok 2 FAIL: EXT task got less than 4.00% of runtime
# TAP version 13
# 1..1
# # Runtime of FAIR task (PID 22518) is 0.240000 seconds
# # Runtime of RT task (PID 22519) is 4.750000 seconds
# # FAIR task got 4.81% of total runtime
# ok 3 PASS: FAIR task got more than 4.00% of runtime
# TAP version 13
# 1..1
# # Runtime of EXT task (PID 22525) is 0.000000 seconds
# # Runtime of RT task (PID 22526) is 4.990000 seconds
# # EXT task got 0.00% of total runtime
# not ok 4 FAIL: EXT task got less than 4.00% of runtime
# ok 24 rt_stall #

Mind that it's expected for the ext task to starve (I didn't apply the patches enabling the server).

After adding all your patches [1], also the ext passes the test (i.e. gets boosted just fine).

I tried disabling all CPUs but CPU0 and run the same test and it hung (bad sign), then I also enabled CPU1 (total 2 CPUs online) and again I see both fair and ext getting their share.

What am I missing here?

Thanks,
Gabriele

[1] - https://lore.kernel.org/lkml/20260126100050.3854740-1-arighi@nvidia.com


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ