linux-kernel - Re: [PATCH] sched_ext: Fix NULL pointer deref and warnings during scx teardown

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <aYH_urViwKNiamo0@gpd4>
Date: Tue, 3 Feb 2026 15:01:30 +0100
From: Andrea Righi <arighi@...dia.com>
To: Tejun Heo <tj@...nel.org>
Cc: David Vernet <void@...ifault.com>, Changwoo Min <changwoo@...lia.com>,
	Emil Tsalapatis <emil@...alapatis.com>, sched-ext@...ts.linux.dev,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched_ext: Fix NULL pointer deref and warnings during
 scx teardown

On Mon, Feb 02, 2026 at 11:50:05PM +0100, Andrea Righi wrote:
> On Mon, Feb 02, 2026 at 10:52:04AM -1000, Tejun Heo wrote:
> > On Mon, Feb 02, 2026 at 07:54:50PM +0100, Andrea Righi wrote:
> > > I'm able to reproduce the NULL pointer dereference in set_cpu_allowed_scx()
> > > quite easily running `stress-ng --race-sched 0` with an scx scheduler that
> > > is intentionally starving tasks, triggering a stall => disable.
> > > 
> > > I think this is what's happening:
> > > 
> > >  CPU0                                      CPU1
> > >  ----                                      ----
> > >  __sched_setscheduler()
> > >    task_rq_lock(p)
> > > 
> > >    next_class = __setscheduler_class()
> > >      // next_class is ext_sched_class
> > >                                            scx_disable_workfn()
> > >                                              scx_set_enable_state(SCX_DISABLING)
> > > 
> > >                                              scx_task_iter_start()
> > >                                              while ((p = next())) {
> > > 					       ...
> > >                                                p->sched_class = fair_sched_class
> > > 					       ...
> > >                                              }
> > >                                              scx_task_iter_stop()
> > > 
> > >                                              synchronize_rcu()
> > >                                              RCU_INIT_POINTER(scx_root, NULL)
> > > 
> > >    scoped_guard(sched_change, ...) {
> > >      p->sched_class = next_class;
> > >        // next_class is still ext_sched_class,
> > >        // overwriting fair_sched_class!
> > >    }
> > >    // Guard ends, calls sched_change_end()
> > >    //   switching_to_scx() called
> > >    //   scx_root == NULL => returns early
> > > 
> > >    task_rq_unlock(p)
> > > 
> > >    sched_setaffinity(p)
> > >      set_cpus_allowed_scx()
> > >        sch = scx_root; // scx_root == NULL => BUG!
> > 
> > Does the following patch fix the issue?
> 
> Nope, I can still trigger this (with the same modified scx_simple +
> stress-ng --race-sched 0:

A quick reproducer:
https://github.com/sched-ext/scx/tree/scx-bug

 $ make
 $ vng -vr -- "stress-ng --race-sched 0 & ./build/scheds/c/scx_bug"
 ...
 [    3.375119] BUG: kernel NULL pointer dereference, address: 00000000000001c0
 [    3.375836] RIP: 0010:set_cpus_allowed_scx+0x1a/0xa0

It happens almost immediately.

-Andrea