linux-kernel - Re: [PATCH 1/2] sched_ext: Fix ops.dequeue() semantics

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <DG1WJEB6B0AC.151EBIUYXCR55@google.com>
Date: Fri, 30 Jan 2026 11:54:00 +0000
From: Kuba Piecuch <jpiecuch@...gle.com>
To: Tejun Heo <tj@...nel.org>, Andrea Righi <arighi@...dia.com>
Cc: David Vernet <void@...ifault.com>, Changwoo Min <changwoo@...lia.com>, 
	Kuba Piecuch <jpiecuch@...gle.com>, Christian Loehle <christian.loehle@....com>, 
	Daniel Hodges <hodgesd@...a.com>, <sched-ext@...ts.linux.dev>, 
	<linux-kernel@...r.kernel.org>, Emil Tsalapatis <emil@...alapatis.com>
Subject: Re: [PATCH 1/2] sched_ext: Fix ops.dequeue() semantics

Hi Tejun,

On Wed Jan 28, 2026 at 9:21 PM UTC, Tejun Heo wrote:
...
> 1. When to call ops.dequeue()?
>
> I'm not sure whether deciding whether to call ops.dequeue() solely onwhether
> ops.enqueue() was called. Direct dispatch has been expanded to include other
> DSQs but was originally added as a way to shortcut the dispatch path and
> "dispatch directly" for execution from ops.select_cpu/enqueue() paths. ie.
> When a task is dispatched directly to a local DSQ, the BPF scheduler is done
> with that task - the task is now in the same state with tasks that get
> dispatched to a local DSQ from ops.dispatch().
>
> ie. What effectively decides whether a task left the BPF scheduler is
> whether the task reached a local DSQ or not, and direct dispatching into a
> local DSQ shouldn't trigger ops.dequeue() - the task never really "queues"
> on the BPF scheduler.

Is "local" short for "local or global", i.e. not user-created?
Direct dispatching into the global DSQ also shouldn't trigger ops.dequeue(),
since dispatch isn't necessary for the task to run. This follows from the last
paragraph:

  Note that, this way, whether ops.dequeue() needs to be called agrees with
  whether the task needs to be dispatched to run.

I agree with your points, just wanted to clarify this one thing.

>
> This creates another discrepancy - From ops.enqueue(), direct dispatching
> into a non-local DSQ clearly makes the task enter the BPF scheduler and thus
> its departure should trigger ops.dequeue(). What about a task which is
> direct dispatched to a non-local DSQ from ops.select_cpu()? Superficially,
> the right thing to do seems to skip ops.dequeue(). After all, the task has
> never been ops.enqueue()'d. However, I think this is another case where
> what's obvious doesn't agree with what's happening underneath.
>
> ops.select_cpu() cannot actually queue anything. It's too early. Direct
> dispatch from ops.select_cpu() is a shortcut to schedule direct dispatch
> once the enqueue path is invoked so that the BPF scheudler can avoid
> invocation of ops.enqueue() when the decision has already been made. While
> this shortcut was added for convenience (so that e.g. the BPF scheduler
> doesn't have to pass a note from ops.select_cpu() to ops.enqueue()), it has
> real performance implications as it does save a roundtrip through
> ops.enqueue() and we know that such overheads do matter for some use cases
> (e.g. maximizing FPS on certain games).
>
> So, while more subtle on the surface, I think the right thing to do is
> basing the decision to call ops.dequeue() on the task's actual state -
> ops.dequeue() should be called if the task is "on" the BPF scheduler - ie.
> if the task ran ops.select_cpu/enqueue() paths and ended up in a non-local
> DSQ or on the BPF side.
>
> The subtlety would need clear documentation and we probably want to allow
> ops.dequeue() to distinguish different cases. If you boil it down to the
> actual task state, I don't think it's that subtle - if a task is in the
> custody of the BPF scheduler, ops.dequeue() will be called. Otherwise, not.
> Note that, this way, whether ops.dequeue() needs to be called agrees with
> whether the task needs to be dispatched to run.

Here's my attempt at documenting this behavior:

After ops.enqueue() is called on a task, the task is owned by the BPF
scheduler, provided the task wasn't direct-dispatched to a local/global DSQ.
When a task is owned by the BPF scheduler, the scheduler needs to dispatch the
task to a local/global DSQ in order for it to run.
When the BPF scheduler loses ownership of the task, either due to dispatching it
to a local/global DSQ or due to external events (core-sched pick, CPU
migration, scheduling property changes), the BPF scheduler is notified through
ops.dequeue() with appropriate flags (TBD).

Thanks,
Kuba