linux-kernel - Re: [PATCH 2/2] selftests/sched_ext: Add test to validate ops.dequeue() semantics

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aYoYECCY_k1IrGqp@slm.duckdns.org>
Date: Mon, 9 Feb 2026 07:23:28 -1000
From: Tejun Heo <tj@...nel.org>
To: Andrea Righi <arighi@...dia.com>
Cc: Emil Tsalapatis <emil@...alapatis.com>,
	David Vernet <void@...ifault.com>,
	Changwoo Min <changwoo@...lia.com>,
	Kuba Piecuch <jpiecuch@...gle.com>,
	Christian Loehle <christian.loehle@....com>,
	Daniel Hodges <hodgesd@...a.com>, sched-ext@...ts.linux.dev,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] selftests/sched_ext: Add test to validate
 ops.dequeue() semantics

Hello,

On Mon, Feb 09, 2026 at 04:43:20PM +0100, Andrea Righi wrote:
> > I agree with going with option 1. 

I think this is the only way. The only reason this is a bit murky is because
we allow direct dispatching from ops.select_cpu() but if you look at how
that's implemented it doesn't really bypass enqueue path. The task still has
to enter the enqueue path (as that's when the rq lock is grabbed and task
state can be updated) while already knowing what to do in the enqueue path.
I don't think it make sense to consider a task to be in the BPF sched's
custody before it has passed through enqueue. Note that you can't even set
the flag - the flag field is protected by the task's rq lock.

> > For the select_cpu() edge case, how about introducing an explicit 
> > kfunc scx_place_in_bpf_custody() later? Placing a task in BPF custody 
> > during select_cpu() is already pretty niche, so we can assume the 
> > scheduler writer knows what they're doing. In that case, let's let 
> > _them_ decide when in select_cpu() the task is considered "in BPF". 
> > They can also do their own locking to avoid races with locking on 
> > the task context. This keeps the state machine clean for the average
> > scheduler while still handling the edge case. DYT that would work?
> 
> Yeah, I was also considering introducing dedicated kfuncs so that the BPF
> scheduler can explicitly manage the "in BPF custody" state, decoupling the
> notion of BPF custody from ops.enqueue(). With such interface, a scheduler
> could do something like:
> 
> ops.select_cpu()
> {
>         s32 pid = p->pid;
> 
>         scx_bpf_enter_custody(p);
>         if (!bpf_map_push_elem(&bpf_queue, &pid, 0)) {
>                 set_task_state(TASK_ENQUEUED);
>         } else {
>                 scx_bpf_exit_custody(p);
>                 set_task_state(TASK_NONE);
>         }
> 
>         return prev_cpu;
> }
> 
> On the implementation side, entering / leaving BPF custody is essentially
> setting / clearing SCX_TASK_IN_BPF, with the scheduler taking full
> responsibility for ensuring the flag is managed consistently: you set the
> flag => ops.dequeue() is called when the task leaves custody, you clear the
> flag => fallback to the default custody behavior.
> 
> But I think this is something to explore in the future, for now I'd go with
> the easier way first. :)

We should just not do it.

Thanks.

-- 
tejun