[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45e4dc7a-f261-46ec-8973-0fb8d1f7b0b9@redhat.com>
Date: Wed, 28 Jan 2026 09:50:36 +0000
From: Gabriele Monaco <gmonaco@...hat.com>
To: Andrea Righi <arighi@...dia.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>, Tejun Heo <tj@...nel.org>,
Joel Fernandes <joelagnelf@...dia.com>,
David Vernet <void@...ifault.com>,
Changwoo Min <changwoo@...lia.com>, Daniel Hodges <hodgesd@...a.com>,
sched-ext@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] sched/deadline: Reset dl_server execution state on
stop
2026-01-27T18:55:02Z Andrea Righi <arighi@...dia.com>:
> Unfortunately checking only runtime <= 0 isn't enough for the sched_ext DL
> server case:
>
> # Runtime of EXT task (PID 2025) is 0.000000 seconds
> # Runtime of RT task (PID 2026) is 4.990000 seconds
> # EXT task got 0.00% of total runtime
> not ok 2 FAIL: EXT task got less than 4.00% of runtime
>
> With the unconditional reset the EXT task gets 5% of the bandwidth. I'll
> add some debugging to figure out exactly what is happening.
Thanks for testing it. That's quite strange..
I run your test on a kernel without ext server, as far as I understand, the test is kinda indirectly checking also the fair server and that does not fail, right?
At least that's what I get on an arm64 machine with 128 CPUs.
After letting the test continue on failure I get:
# # Runtime of FAIR task (PID 22503) is 0.240000 seconds
# # Runtime of RT task (PID 22504) is 4.750000 seconds
# # FAIR task got 4.81% of total runtime
# ok 1 PASS: FAIR task got more than 4.00% of runtime
# TAP version 13
# 1..1
# # Runtime of EXT task (PID 22511) is 0.020000 seconds
# # Runtime of RT task (PID 22512) is 4.970000 seconds
# # EXT task got 0.40% of total runtime
# not ok 2 FAIL: EXT task got less than 4.00% of runtime
# TAP version 13
# 1..1
# # Runtime of FAIR task (PID 22518) is 0.240000 seconds
# # Runtime of RT task (PID 22519) is 4.750000 seconds
# # FAIR task got 4.81% of total runtime
# ok 3 PASS: FAIR task got more than 4.00% of runtime
# TAP version 13
# 1..1
# # Runtime of EXT task (PID 22525) is 0.000000 seconds
# # Runtime of RT task (PID 22526) is 4.990000 seconds
# # EXT task got 0.00% of total runtime
# not ok 4 FAIL: EXT task got less than 4.00% of runtime
# ok 24 rt_stall #
Mind that it's expected for the ext task to starve (I didn't apply the patches enabling the server).
After adding all your patches [1], also the ext passes the test (i.e. gets boosted just fine).
I tried disabling all CPUs but CPU0 and run the same test and it hung (bad sign), then I also enabled CPU1 (total 2 CPUs online) and again I see both fair and ext getting their share.
What am I missing here?
Thanks,
Gabriele
[1] - https://lore.kernel.org/lkml/20260126100050.3854740-1-arighi@nvidia.com
Powered by blists - more mailing lists