lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a8200977-689d-4041-936b-3a92eac1bbe9@nvidia.com>
Date: Fri, 13 Jun 2025 13:35:23 -0400
From: Joel Fernandes <joelagnelf@...dia.com>
To: linux-kernel@...r.kernel.org
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
 Juri Lelli <juri.lelli@...hat.com>,
 Vincent Guittot <vincent.guittot@...aro.org>,
 Dietmar Eggemann <dietmar.eggemann@....com>,
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
 Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
 Tejun Heo <tj@...nel.org>, David Vernet <void@...ifault.com>,
 Andrea Righi <arighi@...dia.com>, Changwoo Min <changwoo@...lia.com>,
 bpf@...r.kernel.org
Subject: Re: [PATCH v3 00/10] Add a deadline server for sched_ext tasks



On 6/13/2025 1:17 AM, Joel Fernandes wrote:
> sched_ext tasks currently are starved by RT hoggers especially since RT
> throttling was replaced by deadline servers to boost only CFS tasks. Several
> users in the community have reported issues with RT stalling sched_ext tasks.
> Add a sched_ext deadline server as well so that sched_ext tasks are also
> boosted and do not suffer starvation.
> 
> A kselftest is also provided to verify the starvation issues are now fixed.
> 
> Btw, there is still something funky going on with CPU hotplug and the
> relinquish patch. Sometimes the sched_ext's hotplug self-test locks up
> (./runner -t hotplug). Reverting that patch fixes it, so I am suspecting
> something is off in dl_server_remove_params() when it is being called on
> offline CPUs.

I think I got somewhere here with this sched_ext hotplug test but still not
there yet. Juri, Andrea, Tejun, can you take a look at the below when you get a
chance?

In the hotplug test, when the CPU is brought online, I see the following warning
fire [1]. Basically, dl_server_apply_params() fails with -EBUSY due to overflow
checks.

@@ -1657,8 +1657,7 @@ void dl_server_start(struct sched_dl_entity *dl_se)
                u64 runtime =  50 * NSEC_PER_MSEC;
                u64 period = 1000 * NSEC_PER_MSEC;

-               dl_server_apply_params(dl_se, runtime, period, 1);
-
+               WARN_ON_ONCE(dl_server_apply_params(dl_se, runtime, period, 1));
                dl_se->dl_server = 1;
                dl_se->dl_defer = 1;
                setup_new_dl_entity(dl_se);

I dug deeper, and it seems CPU 1 was previously brought offline and then online
before the warning happened during *that onlining*. During the onlining,
enqueue_task_scx() -> dl_server_start() was called but dl_server_apply_params()
returned -EBUSY.

In dl_server_apply_params() -> __dl_overflow(), it appears dl_bw_cpus()=0 and
cap=0. That is really odd and probably the reason for warning. Is that because
the CPU was offlined earlier and is not yet attached to the root domain?

The problem also comes down to why does this happen only when calling my
dl_server_remove_params() only and not otherwise, and why on earth is
dl_bw_cpus() returning 0. There's at least 2 other CPUs online at the time.

Anyway, other than this mystery, I fixed all other bandwidth-related warnings
due to dl_server_remove_params() and the updated patch below [2].

[1] Warning:

[   11.878005] DL server bandwidth overflow on CPU 1: dl_b->bw=996147, cap=0,
total_bw=0, old_bw=0, new_bw=52428, dl_bw_cpus=0
[   11.878356] ------------[ cut here ]------------
[   11.878528] WARNING: CPU: 0 PID: 145 at
               kernel/sched/deadline.c:1670 dl_server_start+0x96/0xa0
[   11.879400] Sched_ext: hotplug_cbs (enabled+all), task: runnable_at=+0ms

       [   11.879404] RIP: 0010:dl_server_start+0x96/0xa0
[   11.879732] Code: 53 10 75 1d 49 8b 86 10 0c 00 00 48 8b
[   11.882510] Call Trace:
[   11.882592]  <TASK>
[   11.882685]  enqueue_task_scx+0x190/0x280
[   11.882802]  ttwu_do_activate+0xaa/0x2a0
[   11.882925]  try_to_wake_up+0x371/0x600
[   11.883047]  cpuhp_bringup_ap+0xd6/0x170

       [   11.883172]  cpuhp_invoke_callback+0x142/0x540

              [   11.883327]  _cpu_up+0x15b/0x270
[   11.883450]  cpu_up+0x52/0xb0
[   11.883576]  cpu_subsys_online+0x32/0x120
[   11.883704]  online_store+0x98/0x130
[   11.883824]  kernfs_fop_write_iter+0xeb/0x170
[   11.883972]  vfs_write+0x2c7/0x430

       [   11.884091]  ksys_write+0x70/0xe0
[   11.884209]  do_syscall_64+0xd6/0x250
[   11.884327]  ? clear_bhb_loop+0x40/0x90

       [   11.884443]  entry_SYSCALL_64_after_hwframe+0x77/0x7f


[2]: Updated patch "sched/ext: Relinquish DL server reservations when not needed":
https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=sched/scx-dlserver-boost-rebase&id=56581c2a6bb8e78593df80ad47520a8399055eae

thanks,

 - Joel


> 
> v2->v3:
>  - Removed code duplication in debugfs. Made ext interface separate.
>  - Fixed issue where rq_lock_irqsave was not used in the relinquish patch.
>  - Fixed running bw accounting issue in dl_server_remove_params.
> 
> Link to v1: https://lore.kernel.org/all/20250315022158.2354454-1-joelagnelf@nvidia.com/
> Link to v2: https://lore.kernel.org/all/20250602180110.816225-1-joelagnelf@nvidia.com/
> 
> Andrea Righi (1):
>   selftests/sched_ext: Add test for sched_ext dl_server
> 
> Joel Fernandes (9):
>   sched/debug: Fix updating of ppos on server write ops
>   sched/debug: Stop and start server based on if it was active
>   sched/deadline: Clear the defer params
>   sched: Add support to pick functions to take rf
>   sched: Add a server arg to dl_server_update_idle_time()
>   sched/ext: Add a DL server for sched_ext tasks
>   sched/debug: Add support to change sched_ext server params
>   sched/deadline: Add support to remove DL server bandwidth
>   sched/ext: Relinquish DL server reservations when not needed
> 
>  include/linux/sched.h                         |   2 +-
>  kernel/sched/core.c                           |  19 +-
>  kernel/sched/deadline.c                       |  78 +++++--
>  kernel/sched/debug.c                          | 171 +++++++++++---
>  kernel/sched/ext.c                            | 108 ++++++++-
>  kernel/sched/fair.c                           |  15 +-
>  kernel/sched/idle.c                           |   4 +-
>  kernel/sched/rt.c                             |   2 +-
>  kernel/sched/sched.h                          |  13 +-
>  kernel/sched/stop_task.c                      |   2 +-
>  tools/testing/selftests/sched_ext/Makefile    |   1 +
>  .../selftests/sched_ext/rt_stall.bpf.c        |  23 ++
>  tools/testing/selftests/sched_ext/rt_stall.c  | 213 ++++++++++++++++++
>  13 files changed, 579 insertions(+), 72 deletions(-)
>  create mode 100644 tools/testing/selftests/sched_ext/rt_stall.bpf.c
>  create mode 100644 tools/testing/selftests/sched_ext/rt_stall.c
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ