lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aX5lkCVr9uU78DxL@gpd4>
Date: Sat, 31 Jan 2026 21:26:56 +0100
From: Andrea Righi <arighi@...dia.com>
To: Kuba Piecuch <jpiecuch@...gle.com>
Cc: Tejun Heo <tj@...nel.org>, David Vernet <void@...ifault.com>,
	Changwoo Min <changwoo@...lia.com>,
	Christian Loehle <christian.loehle@....com>,
	Daniel Hodges <hodgesd@...a.com>, sched-ext@...ts.linux.dev,
	linux-kernel@...r.kernel.org,
	Emil Tsalapatis <emil@...alapatis.com>
Subject: Re: [PATCH 1/2] sched_ext: Fix ops.dequeue() semantics

On Sat, Jan 31, 2026 at 05:53:27PM +0000, Kuba Piecuch wrote:
> On Sat Jan 31, 2026 at 9:02 AM UTC, Andrea Righi wrote:
> > On Fri, Jan 30, 2026 at 11:54:00AM +0000, Kuba Piecuch wrote:
> >> Is "local" short for "local or global", i.e. not user-created?
> >> Direct dispatching into the global DSQ also shouldn't trigger ops.dequeue(),
> >> since dispatch isn't necessary for the task to run. This follows from the last
> >> paragraph:
> >> 
> >>   Note that, this way, whether ops.dequeue() needs to be called agrees with
> >>   whether the task needs to be dispatched to run.
> >> 
> >> I agree with your points, just wanted to clarify this one thing.
> >
> > I think this should be interpreted as local DSQs only
> > (SCX_DSQ_LOCAL / SCX_DSQ_LOCAL_ON), not any built-in DSQ. SCX_DSQ_GLOBAL is
> > essentially a built-in user DSQ, provided for convenience, it's not really
> > a "direct dispatch" DSQ.
> 
> SCX_DSQ_GLOBAL is significantly different from user DSQs, because balance_one()
> can pull tasks directly from SCX_DSQ_GLOBAL, while it cannot pull tasks from
> user-created DSQs.
> 
> If a BPF scheduler puts a task onto SCX_DSQ_GLOBAL, then it _must_ be ok with
> balance_one() coming along and pulling that task without the BPF scheduler's
> intervention, so in that way I believe SCX_DSQ_GLOBAL is semantically quite
> similar to local DSQs.

I agree that SCX_DSQ_GLOBAL behaves differently from user-created DSQs at
the implementation level, but I think that difference shouldn't leak into
the logical model.

>From a semantic point of view, dispatching a task to SCX_DSQ_GLOBAL does
not mean that the task leaves the "enqueued by BPF" state. The task is
still under the BPF scheduler's custody, not directly dispatched to a
specific CPU, and remains sched_ext-managed. The scheduler has queued the
task and it hasn't relinquished control over it.

That said, I don't have a strong opinion here. If we prefer to treat
SCX_DSQ_GLOBAL as a "direct dispatch" DSQ for the purposes of ops.dequeue()
semantics, then I'm fine with adjusting the logic accordingly (with proper
documentation).

Tejun, thoughts?

> 
> >> Here's my attempt at documenting this behavior:
> >> 
> >> After ops.enqueue() is called on a task, the task is owned by the BPF
> >> scheduler, provided the task wasn't direct-dispatched to a local/global DSQ.
> >> When a task is owned by the BPF scheduler, the scheduler needs to dispatch the
> >> task to a local/global DSQ in order for it to run.
> >> When the BPF scheduler loses ownership of the task, either due to dispatching it
> >> to a local/global DSQ or due to external events (core-sched pick, CPU
> >> migration, scheduling property changes), the BPF scheduler is notified through
> >> ops.dequeue() with appropriate flags (TBD).
> >
> > This looks good overall, except for the global DSQ part. Also, it might be
> > better to avoid the term “owned”, internally the kernel already uses the
> > concept of "task ownership" with a different meaning (see
> > https://lore.kernel.org/all/aVHAZNbIJLLBHEXY@slm.duckdns.org), and reusing
> > it here could be misleading.
> >
> > With that in mind, I'd probably rephrase your documentation along these
> > lines:
> >
> > After ops.enqueue() is called, the task is considered *enqueued* by the BPF
> > scheduler, unless it is directly dispatched to a local DSQ (via
> > SCX_DSQ_LOCAL or SCX_DSQ_LOCAL_ON).
> >
> > While a task is enqueued, the BPF scheduler must explicitly dispatch it to
> > a DSQ in order for it to run.
> >
> > When a task leaves the enqueued state (either because it is dispatched to a
> > non-local DSQ, or due to external events such as a core-sched pick, CPU
> 
> Shouldn't it be "dispatched to a local DSQ"?

Oh yes, sorry, it should be "dispatched to a local DSQ, ...".

> 
> > migration, or scheduling property changes), ops.dequeue() is invoked to
> > notify the BPF scheduler, with flags indicating the reason for the dequeue:
> > regular dispatch dequeues have no flags set, whereas dequeues triggered by
> > scheduling property changes are reported with SCX_DEQ_SCHED_CHANGE.
> 
> Core-sched dequeues also have a dedicated flag, it should probably be included
> here.

Right, core-sched dequeues should be mentioned as well.

> 
> >
> > What do you think?
> 
> I think using the term "enqueued" isn't very good either since it results in
> two ways in which a task can be considered enqueued:
> 
> 1. Between ops.enqueue() and ops.dequeue()
> 2. Between enqueue_task_scx() and dequeue_task_scx()
> 
> The two are not equivalent, since a task that's running is not enqueued
> according to 1. but is enqueued according to 2.
> 
> I would be ok with it if we change it to something unambiguous, e.g.
> "BPF-enqueued", although that poses a risk of people getting lazy and using
> "enqueued" anyway.
> 
> Some potential alternative terms: "resident"/"BPF-resident",
> "managed"/"BPF-managed", "dispatchable", "pending dispatch",
> or simply "pending".

I agree that enqueued is a very ambiguous term and we probably need
something more BPF-specific. How about a task "under BPF custody"?

Thanks,
-Andrea

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ