lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aEypzNZHMBBFON2h@gpd4>
Date: Sat, 14 Jun 2025 00:44:28 +0200
From: Andrea Righi <arighi@...dia.com>
To: Joel Fernandes <joelagnelf@...dia.com>
Cc: linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Juri Lelli <juri.lelli@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>, Tejun Heo <tj@...nel.org>,
	David Vernet <void@...ifault.com>,
	Changwoo Min <changwoo@...lia.com>, bpf@...r.kernel.org
Subject: Re: [PATCH v3 00/10] Add a deadline server for sched_ext tasks

Hi Joel,

On Fri, Jun 13, 2025 at 02:05:03PM -0400, Joel Fernandes wrote:
> 
> 
> On 6/13/2025 1:35 PM, Joel Fernandes wrote:
> > 
> > 
> > On 6/13/2025 1:17 AM, Joel Fernandes wrote:
> >> sched_ext tasks currently are starved by RT hoggers especially since RT
> >> throttling was replaced by deadline servers to boost only CFS tasks. Several
> >> users in the community have reported issues with RT stalling sched_ext tasks.
> >> Add a sched_ext deadline server as well so that sched_ext tasks are also
> >> boosted and do not suffer starvation.
> >>
> >> A kselftest is also provided to verify the starvation issues are now fixed.
> >>
> >> Btw, there is still something funky going on with CPU hotplug and the
> >> relinquish patch. Sometimes the sched_ext's hotplug self-test locks up
> >> (./runner -t hotplug). Reverting that patch fixes it, so I am suspecting
> >> something is off in dl_server_remove_params() when it is being called on
> >> offline CPUs.
> > 
> > I think I got somewhere here with this sched_ext hotplug test but still not
> > there yet. Juri, Andrea, Tejun, can you take a look at the below when you get a
> > chance?
> 
> The following patch makes the sched_ext hotplug test reliably pass for me now.
> Thoughts?

For me it gets stuck here, when the hotplug test tries to bring the CPU
offline:

TEST: hotplug
DESCRIPTION: Verify hotplug behavior
OUTPUT:
[    5.042497] smpboot: CPU 1 is now offline
[    5.069691] sched_ext: BPF scheduler "hotplug_cbs" enabled
[    5.108705] smpboot: Booting Node 0 Processor 1 APIC 0x1
[    5.149484] sched_ext: BPF scheduler "hotplug_cbs" disabled (unregistered from BPF)
EXIT: unregistered from BPF (hotplug event detected (1 going online))
[    5.204500] sched_ext: BPF scheduler "hotplug_cbs" enabled
Failed to bring CPU offline (Device or resource busy)

However, if I don't stop rq->fair_server in the scx_switching_all case
everything seems to work (which I still don't understand why).

I didn't have much time to look at this today, I'll investigate more
tomorrow.

-Andrea

> 
> From: Joel Fernandes <joelagnelf@...dia.com>
> Subject: [PATCH] sched/deadline: Prevent setting server as started if params
>  couldn't be applied
> 
> The following call trace fails to set dl_server_apply_params() as
> dl_bw_cpus() is 0 during CPU onlining in the below path.
> 
> [   11.878356] ------------[ cut here ]------------
> [   11.882592]  <TASK>
> [   11.882685]  enqueue_task_scx+0x190/0x280
> [   11.882802]  ttwu_do_activate+0xaa/0x2a0
> [   11.882925]  try_to_wake_up+0x371/0x600
> [   11.883047]  cpuhp_bringup_ap+0xd6/0x170
> 
>        [   11.883172]  cpuhp_invoke_callback+0x142/0x540
> 
>               [   11.883327]  _cpu_up+0x15b/0x270
> [   11.883450]  cpu_up+0x52/0xb0
> [   11.883576]  cpu_subsys_online+0x32/0x120
> [   11.883704]  online_store+0x98/0x130
> [   11.883824]  kernfs_fop_write_iter+0xeb/0x170
> [   11.883972]  vfs_write+0x2c7/0x430
> 
>        [   11.884091]  ksys_write+0x70/0xe0
> [   11.884209]  do_syscall_64+0xd6/0x250
> [   11.884327]  ? clear_bhb_loop+0x40/0x90
> 
>        [   11.884443]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> 
> It seems too early to start the server. Simply defer the starting of the
> server to the next enqueue if dl_server_apply_params() returns an error.
> In any case, we should not pretend like the server started and it does
> seem to mess up with the sched_ext CPU hotplug test.
> 
> With this, the sched_ext hotplug test reliably passes.
> 
> Signed-off-by: Joel Fernandes <joelagnelf@...dia.com>
> ---
>  kernel/sched/deadline.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index f0cd1dbca4b8..8dd0c6d71489 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1657,8 +1657,8 @@ void dl_server_start(struct sched_dl_entity *dl_se)
>                 u64 runtime =  50 * NSEC_PER_MSEC;
>                 u64 period = 1000 * NSEC_PER_MSEC;
> 
> -               dl_server_apply_params(dl_se, runtime, period, 1);
> -
> +               if (dl_server_apply_params(dl_se, runtime, period, 1))
> +                       return;
>                 dl_se->dl_server = 1;
>                 dl_se->dl_defer = 1;
>                 setup_new_dl_entity(dl_se);
> @@ -1675,7 +1675,7 @@ void dl_server_start(struct sched_dl_entity *dl_se)
> 
>  void dl_server_stop(struct sched_dl_entity *dl_se)
>  {
> -       if (!dl_se->dl_runtime)
> +       if (!dl_se->dl_runtime || !dl_se->dl_server_active)
>                 return;
> 
>         dequeue_dl_entity(dl_se, DEQUEUE_SLEEP);

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ