linux-kernel - Re: [PATCH 1/2] sched_ext: Fix ops.dequeue() semantics

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <DG2YT5LJFC9T.AXB2OHJBQG4U@google.com>
Date: Sat, 31 Jan 2026 17:53:27 +0000
From: Kuba Piecuch <jpiecuch@...gle.com>
To: Andrea Righi <arighi@...dia.com>, Kuba Piecuch <jpiecuch@...gle.com>
Cc: Tejun Heo <tj@...nel.org>, David Vernet <void@...ifault.com>, 
	Changwoo Min <changwoo@...lia.com>, Christian Loehle <christian.loehle@....com>, 
	Daniel Hodges <hodgesd@...a.com>, <sched-ext@...ts.linux.dev>, 
	<linux-kernel@...r.kernel.org>, Emil Tsalapatis <emil@...alapatis.com>
Subject: Re: [PATCH 1/2] sched_ext: Fix ops.dequeue() semantics

On Sat Jan 31, 2026 at 9:02 AM UTC, Andrea Righi wrote:
> On Fri, Jan 30, 2026 at 11:54:00AM +0000, Kuba Piecuch wrote:
>> Is "local" short for "local or global", i.e. not user-created?
>> Direct dispatching into the global DSQ also shouldn't trigger ops.dequeue(),
>> since dispatch isn't necessary for the task to run. This follows from the last
>> paragraph:
>> 
>>   Note that, this way, whether ops.dequeue() needs to be called agrees with
>>   whether the task needs to be dispatched to run.
>> 
>> I agree with your points, just wanted to clarify this one thing.
>
> I think this should be interpreted as local DSQs only
> (SCX_DSQ_LOCAL / SCX_DSQ_LOCAL_ON), not any built-in DSQ. SCX_DSQ_GLOBAL is
> essentially a built-in user DSQ, provided for convenience, it's not really
> a "direct dispatch" DSQ.

SCX_DSQ_GLOBAL is significantly different from user DSQs, because balance_one()
can pull tasks directly from SCX_DSQ_GLOBAL, while it cannot pull tasks from
user-created DSQs.

If a BPF scheduler puts a task onto SCX_DSQ_GLOBAL, then it _must_ be ok with
balance_one() coming along and pulling that task without the BPF scheduler's
intervention, so in that way I believe SCX_DSQ_GLOBAL is semantically quite
similar to local DSQs.

>> Here's my attempt at documenting this behavior:
>> 
>> After ops.enqueue() is called on a task, the task is owned by the BPF
>> scheduler, provided the task wasn't direct-dispatched to a local/global DSQ.
>> When a task is owned by the BPF scheduler, the scheduler needs to dispatch the
>> task to a local/global DSQ in order for it to run.
>> When the BPF scheduler loses ownership of the task, either due to dispatching it
>> to a local/global DSQ or due to external events (core-sched pick, CPU
>> migration, scheduling property changes), the BPF scheduler is notified through
>> ops.dequeue() with appropriate flags (TBD).
>
> This looks good overall, except for the global DSQ part. Also, it might be
> better to avoid the term “owned”, internally the kernel already uses the
> concept of "task ownership" with a different meaning (see
> https://lore.kernel.org/all/aVHAZNbIJLLBHEXY@slm.duckdns.org), and reusing
> it here could be misleading.
>
> With that in mind, I'd probably rephrase your documentation along these
> lines:
>
> After ops.enqueue() is called, the task is considered *enqueued* by the BPF
> scheduler, unless it is directly dispatched to a local DSQ (via
> SCX_DSQ_LOCAL or SCX_DSQ_LOCAL_ON).
>
> While a task is enqueued, the BPF scheduler must explicitly dispatch it to
> a DSQ in order for it to run.
>
> When a task leaves the enqueued state (either because it is dispatched to a
> non-local DSQ, or due to external events such as a core-sched pick, CPU

Shouldn't it be "dispatched to a local DSQ"?

> migration, or scheduling property changes), ops.dequeue() is invoked to
> notify the BPF scheduler, with flags indicating the reason for the dequeue:
> regular dispatch dequeues have no flags set, whereas dequeues triggered by
> scheduling property changes are reported with SCX_DEQ_SCHED_CHANGE.

Core-sched dequeues also have a dedicated flag, it should probably be included
here.

>
> What do you think?

I think using the term "enqueued" isn't very good either since it results in
two ways in which a task can be considered enqueued:

1. Between ops.enqueue() and ops.dequeue()
2. Between enqueue_task_scx() and dequeue_task_scx()

The two are not equivalent, since a task that's running is not enqueued
according to 1. but is enqueued according to 2.

I would be ok with it if we change it to something unambiguous, e.g.
"BPF-enqueued", although that poses a risk of people getting lazy and using
"enqueued" anyway.

Some potential alternative terms: "resident"/"BPF-resident",
"managed"/"BPF-managed", "dispatchable", "pending dispatch",
or simply "pending".

Thanks,
Kuba