lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aYEqGSGzxrYU5PZt@gpd4>
Date: Mon, 2 Feb 2026 23:50:01 +0100
From: Andrea Righi <arighi@...dia.com>
To: Tejun Heo <tj@...nel.org>
Cc: David Vernet <void@...ifault.com>, Changwoo Min <changwoo@...lia.com>,
	Emil Tsalapatis <emil@...alapatis.com>, sched-ext@...ts.linux.dev,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched_ext: Fix NULL pointer deref and warnings during
 scx teardown

On Mon, Feb 02, 2026 at 10:52:04AM -1000, Tejun Heo wrote:
> On Mon, Feb 02, 2026 at 07:54:50PM +0100, Andrea Righi wrote:
> > I'm able to reproduce the NULL pointer dereference in set_cpu_allowed_scx()
> > quite easily running `stress-ng --race-sched 0` with an scx scheduler that
> > is intentionally starving tasks, triggering a stall => disable.
> > 
> > I think this is what's happening:
> > 
> >  CPU0                                      CPU1
> >  ----                                      ----
> >  __sched_setscheduler()
> >    task_rq_lock(p)
> > 
> >    next_class = __setscheduler_class()
> >      // next_class is ext_sched_class
> >                                            scx_disable_workfn()
> >                                              scx_set_enable_state(SCX_DISABLING)
> > 
> >                                              scx_task_iter_start()
> >                                              while ((p = next())) {
> > 					       ...
> >                                                p->sched_class = fair_sched_class
> > 					       ...
> >                                              }
> >                                              scx_task_iter_stop()
> > 
> >                                              synchronize_rcu()
> >                                              RCU_INIT_POINTER(scx_root, NULL)
> > 
> >    scoped_guard(sched_change, ...) {
> >      p->sched_class = next_class;
> >        // next_class is still ext_sched_class,
> >        // overwriting fair_sched_class!
> >    }
> >    // Guard ends, calls sched_change_end()
> >    //   switching_to_scx() called
> >    //   scx_root == NULL => returns early
> > 
> >    task_rq_unlock(p)
> > 
> >    sched_setaffinity(p)
> >      set_cpus_allowed_scx()
> >        sch = scx_root; // scx_root == NULL => BUG!
> 
> Does the following patch fix the issue?

Nope, I can still trigger this (with the same modified scx_simple +
stress-ng --race-sched 0:

[   15.899233] sched_ext: BPF scheduler "simple" disabled (runtime error)
[   15.899447] sched_ext: simple: SCX_DSQ_LOCAL[_ON] target CPU 10 not allowed for stress-ng-race-[726]
[   15.899586]    scx_exit+0x50/0x70
[   15.899655]    task_can_run_on_remote_rq+0x8c/0x180
[   15.899735]    dispatch_to_local_dsq+0x61/0x1f0
[   15.899900]    flush_dispatch_buf+0x15e/0x190
[   15.899994]    pick_task_scx+0x2b2/0x890
[   15.900058]    __schedule+0x683/0x1250
[   15.900135]    schedule_idle+0x22/0x40
[   15.900263]    cpu_startup_entry+0x29/0x30
[   15.900330]    start_secondary+0xf8/0x100
[   15.900394]    common_startup_64+0x13e/0x148
[   15.900539] BUG: kernel NULL pointer dereference, address: 00000000000001c0
[   15.900660] #PF: supervisor read access in kernel mode
[   15.900724] #PF: error_code(0x0000) - not-present page
[   15.900787] PGD 0 P4D 0
[   15.900822] Oops: Oops: 0000 [#1] SMP NOPTI
[   15.900872] CPU: 9 UID: 1000 PID: 350 Comm: stress-ng-race- Not tainted 6.19.0-rc8-virtme #43 PREEMPT(voluntary)
[   15.900992] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   15.901068] RIP: 0010:set_cpus_allowed_scx+0x1a/0xa0
[   15.901148] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 55 48 8b 2d 37 39 6e 02 53 48 89 fb e8 16 9b fe ff <48> 8b 85 c0 01 00 00 f6 c4 10 74 50 65 48 8b 05 ba c9 4c 02 8b b0
[   15.901378] RSP: 0018:ffffd432c0e27df8 EFLAGS: 00010086
[   15.901442] RAX: ffff8cbc827db0d0 RBX: ffff8cbc86870000 RCX: ffff8cbc827db280
[   15.901537] RDX: ffff8cbc86870000 RSI: ffffd432c0e27eb8 RDI: 0000000000000200
[   15.901624] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
[   15.901713] R10: 0000000000000001 R11: 0000000000000001 R12: ffffd432c0e27eb8
[   15.901807] R13: ffffd432c0e27e50 R14: ffff8cbcba218500 R15: 0000000000000000
[   15.901900] FS:  00007f398e11eb00(0000) GS:ffff8cbd23723000(0000) knlGS:0000000000000000
[   15.901998] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   15.902072] CR2: 00000000000001c0 CR3: 0000000103f4b000 CR4: 0000000000750ef0
[   15.902195] PKRU: 55555554
[   15.902232] Call Trace:
[   15.902268]  <TASK>
[   15.902302]  __set_cpus_allowed_ptr_locked+0x142/0x1c0
[   15.902368]  __set_cpus_allowed_ptr+0x64/0xa0
[   15.902435]  __sched_setaffinity+0x72/0x100
[   15.902489]  sched_setaffinity+0x281/0x360
[   15.902543]  __x64_sys_sched_setaffinity+0x50/0x80
[   15.902608]  do_syscall_64+0xbd/0xf80
[   15.902660]  entry_SYSCALL_64_after_hwframe+0x77/0x7f

> 
> Thanks.
> 
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 136b01950a62..1fc2b358a175 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -4234,7 +4234,13 @@ static void scx_disable_workfn(struct kthread_work *work)
>  	 * Here, every runnable task is guaranteed to make forward progress and
>  	 * we can safely use blocking synchronization constructs. Actually
>  	 * disable ops.
> +	 *
> +	 * Wait for all CPUs to observe %SCX_DISABLING. Otherwise,
> +	 * task_should_scx() can see %SCX_ENABLED and __sched_setscheduler() put
> +	 * a task into sched_ext while we're migrating tasks out below.
>  	 */
> +	synchronize_rcu();
> +
>  	mutex_lock(&scx_enable_mutex);
>  
>  	static_branch_disable(&__scx_switched_all);

-Andrea

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ