[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aYoyxIkzp0E5dL1g@gpd4>
Date: Mon, 9 Feb 2026 20:17:24 +0100
From: Andrea Righi <arighi@...dia.com>
To: Tejun Heo <tj@...nel.org>
Cc: Emil Tsalapatis <emil@...alapatis.com>,
David Vernet <void@...ifault.com>,
Changwoo Min <changwoo@...lia.com>,
Kuba Piecuch <jpiecuch@...gle.com>,
Christian Loehle <christian.loehle@....com>,
Daniel Hodges <hodgesd@...a.com>, sched-ext@...ts.linux.dev,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] selftests/sched_ext: Add test to validate
ops.dequeue() semantics
On Mon, Feb 09, 2026 at 07:23:28AM -1000, Tejun Heo wrote:
> Hello,
>
> On Mon, Feb 09, 2026 at 04:43:20PM +0100, Andrea Righi wrote:
> > > I agree with going with option 1.
>
> I think this is the only way. The only reason this is a bit murky is because
> we allow direct dispatching from ops.select_cpu() but if you look at how
> that's implemented it doesn't really bypass enqueue path. The task still has
> to enter the enqueue path (as that's when the rq lock is grabbed and task
> state can be updated) while already knowing what to do in the enqueue path.
> I don't think it make sense to consider a task to be in the BPF sched's
> custody before it has passed through enqueue. Note that you can't even set
> the flag - the flag field is protected by the task's rq lock.
Agreed. And just to be clear, for the purpose of triggering ops.dequeue(),
**all** direct dispatches from ops.select_cpu() should be consistently
ignored, including dispatches to user DSQs. I'll update this behavior in
the next version, because this one treats direct dispatches to user DSQs
from ops.select_cpu() as if the task is in the scheduler's custody, which
shouldn't be the case for consistency.
>
> > > For the select_cpu() edge case, how about introducing an explicit
> > > kfunc scx_place_in_bpf_custody() later? Placing a task in BPF custody
> > > during select_cpu() is already pretty niche, so we can assume the
> > > scheduler writer knows what they're doing. In that case, let's let
> > > _them_ decide when in select_cpu() the task is considered "in BPF".
> > > They can also do their own locking to avoid races with locking on
> > > the task context. This keeps the state machine clean for the average
> > > scheduler while still handling the edge case. DYT that would work?
> >
> > Yeah, I was also considering introducing dedicated kfuncs so that the BPF
> > scheduler can explicitly manage the "in BPF custody" state, decoupling the
> > notion of BPF custody from ops.enqueue(). With such interface, a scheduler
> > could do something like:
> >
> > ops.select_cpu()
> > {
> > s32 pid = p->pid;
> >
> > scx_bpf_enter_custody(p);
> > if (!bpf_map_push_elem(&bpf_queue, &pid, 0)) {
> > set_task_state(TASK_ENQUEUED);
> > } else {
> > scx_bpf_exit_custody(p);
> > set_task_state(TASK_NONE);
> > }
> >
> > return prev_cpu;
> > }
> >
> > On the implementation side, entering / leaving BPF custody is essentially
> > setting / clearing SCX_TASK_IN_BPF, with the scheduler taking full
> > responsibility for ensuring the flag is managed consistently: you set the
> > flag => ops.dequeue() is called when the task leaves custody, you clear the
> > flag => fallback to the default custody behavior.
> >
> > But I think this is something to explore in the future, for now I'd go with
> > the easier way first. :)
>
> We should just not do it.
Ack.
Thanks,
-Andrea
Powered by blists - more mailing lists