[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <83c8989c-bd9b-4eed-8372-e280c80a93f5@arm.com>
Date: Thu, 23 Oct 2025 16:01:59 +0100
From: Christian Loehle <christian.loehle@....com>
To: Andrea Righi <arighi@...dia.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
Joel Fernandes <joelagnelf@...dia.com>, Tejun Heo <tj@...nel.org>,
David Vernet <void@...ifault.com>, Changwoo Min <changwoo@...lia.com>,
Shuah Khan <shuah@...nel.org>, sched-ext@...ts.linux.dev,
bpf@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 13/14] selftests/sched_ext: Add test for sched_ext
dl_server
On 10/20/25 15:21, Christian Loehle wrote:
> On 10/20/25 14:55, Andrea Righi wrote:
>> Hi Christian,
>>
>> On Mon, Oct 20, 2025 at 02:26:17PM +0100, Christian Loehle wrote:
>>> On 10/17/25 10:26, Andrea Righi wrote:
>>>> Add a selftest to validate the correct behavior of the deadline server
>>>> for the ext_sched_class.
>>>>
>>>> [ Joel: Replaced occurences of CFS in the test with EXT. ]
>>>>
>>>> Co-developed-by: Joel Fernandes <joelagnelf@...dia.com>
>>>> Signed-off-by: Joel Fernandes <joelagnelf@...dia.com>
>>>> Signed-off-by: Andrea Righi <arighi@...dia.com>
>>>> ---
>>>> tools/testing/selftests/sched_ext/Makefile | 1 +
>>>> .../selftests/sched_ext/rt_stall.bpf.c | 23 ++
>>>> tools/testing/selftests/sched_ext/rt_stall.c | 214 ++++++++++++++++++
>>>> 3 files changed, 238 insertions(+)
>>>> create mode 100644 tools/testing/selftests/sched_ext/rt_stall.bpf.c
>>>> create mode 100644 tools/testing/selftests/sched_ext/rt_stall.c
>>>
>>>
>>> Does this pass consistently for you?
>>> For a loop of 1000 runs I'm getting total runtime numbers for the EXT task of:
>>>
>>> 0.000 - 0.261 | (7)
>>> 0.261 - 0.522 | ###### (86)
>>> 0.522 - 4.437 | (0)
>>> 4.437 - 4.698 | (1)
>>> 4.698 - 4.959 | ################### (257)
>>> 4.959 - 5.220 | ################################################## (649)
>>>
>>> I'll try to see what's going wrong here...
>>
>> Is that 1000 runs of total_bw? Yeah, the small ones don't look right at
>> all, unless they're caused by some errors in the measurement (or something
>> wrong in the test itself). Still better than without the dl_server, but
>> it'd be nice to understand what's going on. :)
>>
>> I'll try to reproduce that on my side as well.
>>
>
> Yes it's pretty much
> for i in $(seq 0 999); do ./runner -t rt_stall ; sleep 10; done
>
> I also tried to increase the runtime of the test, but results look the same so I
> assume the DL server isn't running in the fail cases.
>
FWIW the below fixes the issue and also explains why runtime of the test was irrelevant.
I wonder if we should let the test do FAIR->EXT->FAIR->EXT or something like that,
the change would be minimal and coverage improved significantly IMO.
-----8<-----
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index c5f3c39972b6..ed48c681c4c2 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -2568,6 +2568,8 @@ static void dl_server_on(struct rq *rq, bool switch_all)
err = dl_server_init_params(&rq->ext_server);
WARN_ON_ONCE(err);
+ if (rq->scx.nr_running)
+ dl_server_start(&rq->ext_server);
rq_unlock_irqrestore(rq, &rf);
}
Powered by blists - more mailing lists