[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aYEqGSGzxrYU5PZt@gpd4>
Date: Mon, 2 Feb 2026 23:50:01 +0100
From: Andrea Righi <arighi@...dia.com>
To: Tejun Heo <tj@...nel.org>
Cc: David Vernet <void@...ifault.com>, Changwoo Min <changwoo@...lia.com>,
Emil Tsalapatis <emil@...alapatis.com>, sched-ext@...ts.linux.dev,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched_ext: Fix NULL pointer deref and warnings during
scx teardown
On Mon, Feb 02, 2026 at 10:52:04AM -1000, Tejun Heo wrote:
> On Mon, Feb 02, 2026 at 07:54:50PM +0100, Andrea Righi wrote:
> > I'm able to reproduce the NULL pointer dereference in set_cpu_allowed_scx()
> > quite easily running `stress-ng --race-sched 0` with an scx scheduler that
> > is intentionally starving tasks, triggering a stall => disable.
> >
> > I think this is what's happening:
> >
> > CPU0 CPU1
> > ---- ----
> > __sched_setscheduler()
> > task_rq_lock(p)
> >
> > next_class = __setscheduler_class()
> > // next_class is ext_sched_class
> > scx_disable_workfn()
> > scx_set_enable_state(SCX_DISABLING)
> >
> > scx_task_iter_start()
> > while ((p = next())) {
> > ...
> > p->sched_class = fair_sched_class
> > ...
> > }
> > scx_task_iter_stop()
> >
> > synchronize_rcu()
> > RCU_INIT_POINTER(scx_root, NULL)
> >
> > scoped_guard(sched_change, ...) {
> > p->sched_class = next_class;
> > // next_class is still ext_sched_class,
> > // overwriting fair_sched_class!
> > }
> > // Guard ends, calls sched_change_end()
> > // switching_to_scx() called
> > // scx_root == NULL => returns early
> >
> > task_rq_unlock(p)
> >
> > sched_setaffinity(p)
> > set_cpus_allowed_scx()
> > sch = scx_root; // scx_root == NULL => BUG!
>
> Does the following patch fix the issue?
Nope, I can still trigger this (with the same modified scx_simple +
stress-ng --race-sched 0:
[ 15.899233] sched_ext: BPF scheduler "simple" disabled (runtime error)
[ 15.899447] sched_ext: simple: SCX_DSQ_LOCAL[_ON] target CPU 10 not allowed for stress-ng-race-[726]
[ 15.899586] scx_exit+0x50/0x70
[ 15.899655] task_can_run_on_remote_rq+0x8c/0x180
[ 15.899735] dispatch_to_local_dsq+0x61/0x1f0
[ 15.899900] flush_dispatch_buf+0x15e/0x190
[ 15.899994] pick_task_scx+0x2b2/0x890
[ 15.900058] __schedule+0x683/0x1250
[ 15.900135] schedule_idle+0x22/0x40
[ 15.900263] cpu_startup_entry+0x29/0x30
[ 15.900330] start_secondary+0xf8/0x100
[ 15.900394] common_startup_64+0x13e/0x148
[ 15.900539] BUG: kernel NULL pointer dereference, address: 00000000000001c0
[ 15.900660] #PF: supervisor read access in kernel mode
[ 15.900724] #PF: error_code(0x0000) - not-present page
[ 15.900787] PGD 0 P4D 0
[ 15.900822] Oops: Oops: 0000 [#1] SMP NOPTI
[ 15.900872] CPU: 9 UID: 1000 PID: 350 Comm: stress-ng-race- Not tainted 6.19.0-rc8-virtme #43 PREEMPT(voluntary)
[ 15.900992] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 15.901068] RIP: 0010:set_cpus_allowed_scx+0x1a/0xa0
[ 15.901148] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 55 48 8b 2d 37 39 6e 02 53 48 89 fb e8 16 9b fe ff <48> 8b 85 c0 01 00 00 f6 c4 10 74 50 65 48 8b 05 ba c9 4c 02 8b b0
[ 15.901378] RSP: 0018:ffffd432c0e27df8 EFLAGS: 00010086
[ 15.901442] RAX: ffff8cbc827db0d0 RBX: ffff8cbc86870000 RCX: ffff8cbc827db280
[ 15.901537] RDX: ffff8cbc86870000 RSI: ffffd432c0e27eb8 RDI: 0000000000000200
[ 15.901624] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
[ 15.901713] R10: 0000000000000001 R11: 0000000000000001 R12: ffffd432c0e27eb8
[ 15.901807] R13: ffffd432c0e27e50 R14: ffff8cbcba218500 R15: 0000000000000000
[ 15.901900] FS: 00007f398e11eb00(0000) GS:ffff8cbd23723000(0000) knlGS:0000000000000000
[ 15.901998] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 15.902072] CR2: 00000000000001c0 CR3: 0000000103f4b000 CR4: 0000000000750ef0
[ 15.902195] PKRU: 55555554
[ 15.902232] Call Trace:
[ 15.902268] <TASK>
[ 15.902302] __set_cpus_allowed_ptr_locked+0x142/0x1c0
[ 15.902368] __set_cpus_allowed_ptr+0x64/0xa0
[ 15.902435] __sched_setaffinity+0x72/0x100
[ 15.902489] sched_setaffinity+0x281/0x360
[ 15.902543] __x64_sys_sched_setaffinity+0x50/0x80
[ 15.902608] do_syscall_64+0xbd/0xf80
[ 15.902660] entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> Thanks.
>
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 136b01950a62..1fc2b358a175 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -4234,7 +4234,13 @@ static void scx_disable_workfn(struct kthread_work *work)
> * Here, every runnable task is guaranteed to make forward progress and
> * we can safely use blocking synchronization constructs. Actually
> * disable ops.
> + *
> + * Wait for all CPUs to observe %SCX_DISABLING. Otherwise,
> + * task_should_scx() can see %SCX_ENABLED and __sched_setscheduler() put
> + * a task into sched_ext while we're migrating tasks out below.
> */
> + synchronize_rcu();
> +
> mutex_lock(&scx_enable_mutex);
>
> static_branch_disable(&__scx_switched_all);
-Andrea
Powered by blists - more mailing lists