[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aX5lkCVr9uU78DxL@gpd4>
Date: Sat, 31 Jan 2026 21:26:56 +0100
From: Andrea Righi <arighi@...dia.com>
To: Kuba Piecuch <jpiecuch@...gle.com>
Cc: Tejun Heo <tj@...nel.org>, David Vernet <void@...ifault.com>,
Changwoo Min <changwoo@...lia.com>,
Christian Loehle <christian.loehle@....com>,
Daniel Hodges <hodgesd@...a.com>, sched-ext@...ts.linux.dev,
linux-kernel@...r.kernel.org,
Emil Tsalapatis <emil@...alapatis.com>
Subject: Re: [PATCH 1/2] sched_ext: Fix ops.dequeue() semantics
On Sat, Jan 31, 2026 at 05:53:27PM +0000, Kuba Piecuch wrote:
> On Sat Jan 31, 2026 at 9:02 AM UTC, Andrea Righi wrote:
> > On Fri, Jan 30, 2026 at 11:54:00AM +0000, Kuba Piecuch wrote:
> >> Is "local" short for "local or global", i.e. not user-created?
> >> Direct dispatching into the global DSQ also shouldn't trigger ops.dequeue(),
> >> since dispatch isn't necessary for the task to run. This follows from the last
> >> paragraph:
> >>
> >> Note that, this way, whether ops.dequeue() needs to be called agrees with
> >> whether the task needs to be dispatched to run.
> >>
> >> I agree with your points, just wanted to clarify this one thing.
> >
> > I think this should be interpreted as local DSQs only
> > (SCX_DSQ_LOCAL / SCX_DSQ_LOCAL_ON), not any built-in DSQ. SCX_DSQ_GLOBAL is
> > essentially a built-in user DSQ, provided for convenience, it's not really
> > a "direct dispatch" DSQ.
>
> SCX_DSQ_GLOBAL is significantly different from user DSQs, because balance_one()
> can pull tasks directly from SCX_DSQ_GLOBAL, while it cannot pull tasks from
> user-created DSQs.
>
> If a BPF scheduler puts a task onto SCX_DSQ_GLOBAL, then it _must_ be ok with
> balance_one() coming along and pulling that task without the BPF scheduler's
> intervention, so in that way I believe SCX_DSQ_GLOBAL is semantically quite
> similar to local DSQs.
I agree that SCX_DSQ_GLOBAL behaves differently from user-created DSQs at
the implementation level, but I think that difference shouldn't leak into
the logical model.
>From a semantic point of view, dispatching a task to SCX_DSQ_GLOBAL does
not mean that the task leaves the "enqueued by BPF" state. The task is
still under the BPF scheduler's custody, not directly dispatched to a
specific CPU, and remains sched_ext-managed. The scheduler has queued the
task and it hasn't relinquished control over it.
That said, I don't have a strong opinion here. If we prefer to treat
SCX_DSQ_GLOBAL as a "direct dispatch" DSQ for the purposes of ops.dequeue()
semantics, then I'm fine with adjusting the logic accordingly (with proper
documentation).
Tejun, thoughts?
>
> >> Here's my attempt at documenting this behavior:
> >>
> >> After ops.enqueue() is called on a task, the task is owned by the BPF
> >> scheduler, provided the task wasn't direct-dispatched to a local/global DSQ.
> >> When a task is owned by the BPF scheduler, the scheduler needs to dispatch the
> >> task to a local/global DSQ in order for it to run.
> >> When the BPF scheduler loses ownership of the task, either due to dispatching it
> >> to a local/global DSQ or due to external events (core-sched pick, CPU
> >> migration, scheduling property changes), the BPF scheduler is notified through
> >> ops.dequeue() with appropriate flags (TBD).
> >
> > This looks good overall, except for the global DSQ part. Also, it might be
> > better to avoid the term “owned”, internally the kernel already uses the
> > concept of "task ownership" with a different meaning (see
> > https://lore.kernel.org/all/aVHAZNbIJLLBHEXY@slm.duckdns.org), and reusing
> > it here could be misleading.
> >
> > With that in mind, I'd probably rephrase your documentation along these
> > lines:
> >
> > After ops.enqueue() is called, the task is considered *enqueued* by the BPF
> > scheduler, unless it is directly dispatched to a local DSQ (via
> > SCX_DSQ_LOCAL or SCX_DSQ_LOCAL_ON).
> >
> > While a task is enqueued, the BPF scheduler must explicitly dispatch it to
> > a DSQ in order for it to run.
> >
> > When a task leaves the enqueued state (either because it is dispatched to a
> > non-local DSQ, or due to external events such as a core-sched pick, CPU
>
> Shouldn't it be "dispatched to a local DSQ"?
Oh yes, sorry, it should be "dispatched to a local DSQ, ...".
>
> > migration, or scheduling property changes), ops.dequeue() is invoked to
> > notify the BPF scheduler, with flags indicating the reason for the dequeue:
> > regular dispatch dequeues have no flags set, whereas dequeues triggered by
> > scheduling property changes are reported with SCX_DEQ_SCHED_CHANGE.
>
> Core-sched dequeues also have a dedicated flag, it should probably be included
> here.
Right, core-sched dequeues should be mentioned as well.
>
> >
> > What do you think?
>
> I think using the term "enqueued" isn't very good either since it results in
> two ways in which a task can be considered enqueued:
>
> 1. Between ops.enqueue() and ops.dequeue()
> 2. Between enqueue_task_scx() and dequeue_task_scx()
>
> The two are not equivalent, since a task that's running is not enqueued
> according to 1. but is enqueued according to 2.
>
> I would be ok with it if we change it to something unambiguous, e.g.
> "BPF-enqueued", although that poses a risk of people getting lazy and using
> "enqueued" anyway.
>
> Some potential alternative terms: "resident"/"BPF-resident",
> "managed"/"BPF-managed", "dispatchable", "pending dispatch",
> or simply "pending".
I agree that enqueued is a very ambiguous term and we probably need
something more BPF-specific. How about a task "under BPF custody"?
Thanks,
-Andrea
Powered by blists - more mailing lists