linux-kernel - Re: [PATCH sched_ext/for-6.15] sched_ext: initialize built-in idle state before ops.init()

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <Z-Jxt3n6clbABIr9@gpd3>
Date: Tue, 25 Mar 2025 10:04:55 +0100
From: Andrea Righi <arighi@...dia.com>
To: Tejun Heo <tj@...nel.org>, David Vernet <void@...ifault.com>,
	Changwoo Min <changwoo@...lia.com>
Cc: Joel Fernandes <joelagnelf@...dia.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH sched_ext/for-6.15] sched_ext: initialize built-in idle
 state before ops.init()

On Mon, Mar 24, 2025 at 09:57:53AM +0100, Andrea Righi wrote:
...
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 06561d6717c9a..1ba02755ae8ad 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -5361,6 +5361,8 @@ static int scx_ops_enable(struct sched_ext_ops *ops, struct bpf_link *link)
>  	 */
>  	cpus_read_lock();
>  
> +	scx_idle_enable(ops);
> +

Actually, I just noticed a problem: if we call scx_idle_enable() under
cpus_read_lock() we may re-acquire cpu_hotplug_lock because of the
static_branch_enable/disable() calls, that are trying to re-acquire the
lock, which is not correct.

So, we either need to use static_branch_enable/disable_cpuslocked() or
place scx_idle_enable() outside of cpus_read_lock().

I just notice this from a lockdep splat on an arm64 machine (not sure why
lockdep was happy when I was testing this in vng):

[   65.974439] WARNING: possible recursive locking detected
...
[   65.983540] --------------------------------------------
[   65.989039] scx_bpfland/3883 is trying to acquire lock:
[   65.994447] ffffb80a490991d8 (cpu_hotplug_lock){++++}-{0:0}, at: cpus_read_lock+0x18/0x30
[   66.002941]
               but task is already holding lock:
[   66.008978] ffffb80a490991d8 (cpu_hotplug_lock){++++}-{0:0}, at: cpus_read_lock+0x18/0x30
[   66.017455]
               other info that might help us debug this:
[   66.024212]  Possible unsafe locking scenario:

[   66.030338]        CPU0
[   66.032855]        ----
[   66.035372]   lock(cpu_hotplug_lock);
[   66.039154]   lock(cpu_hotplug_lock);
[   66.042935]
                *** DEADLOCK ***

Anyway, please ignore this patch, I'll send a new one soon.

-Andrea