[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aVFmstVjLW_QIQis@slm.duckdns.org>
Date: Sun, 28 Dec 2025 07:19:46 -1000
From: Tejun Heo <tj@...nel.org>
To: Andrea Righi <arighi@...dia.com>
Cc: David Vernet <void@...ifault.com>, Changwoo Min <changwoo@...lia.com>,
Emil Tsalapatis <emil@...alapatis.com>,
Daniel Hodges <hodgesd@...a.com>, sched-ext@...ts.linux.dev,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] sched_ext: Fix ops.dequeue() semantics
Hello, Andrea.
On Fri, Dec 19, 2025 at 11:43:14PM +0100, Andrea Righi wrote:
...
> + Once ``ops.enqueue()`` is called, the task is considered "enqueued" and
> + is owned by the BPF scheduler. Ownership is retained until the task is
> + either dispatched (moved to a local DSQ for execution) or dequeued
> + (removed from the scheduler due to a blocking event, or to modify a
> + property, like CPU affinity, priority, etc.). When the task leaves the
> + BPF scheduler ``ops.dequeue()`` is invoked.
> +
> + **Important**: ``ops.dequeue()`` is called for *any* enqueued task,
> + regardless of whether the task is still on a BPF data structure, or it
> + is already dispatched to a DSQ (global, local, or user DSQ)
> +
> + This guarantees that every ``ops.enqueue()`` will eventually be followed
> + by a ``ops.dequeue()``. This makes it reliable for BPF schedulers to
> + track task ownership and maintain accurate accounting, such as per-DSQ
> + queued runtime sums.
While this works, from the BPF sched's POV, there's no way to tell whether
an ops.dequeue() call is from the task being actually dequeued or the
follow-up to the dispatch operation it just did. This won't make much
difference if ops.dequeue() is just used for accounting purposes, but, a
scheduler which uses an arena data structure for queueing would likely need
to perform extra tests to tell whether the task needs to be dequeued from
the arena side. I *think* hot path (ops.dequeue() following the task's
dispatch) can be a simple lockless test, so this may be okay, but from API
POV, it can probably be better.
The counter interlocking point is scx_bpf_dsq_insert(). If we can
synchronize scx_bpf_dsq_insert() and dequeue so that ops.dequeue() is not
called for a successfully inserted task, I think the semantics would be
neater - an enqueued task is either dispatched or dequeued. Due to the async
dispatch operation, this likely is difficult to do without adding extra sync
operations in scx_bpf_dsq_insert(). However, I *think* we may be able to get
rid of dspc and async inserting if we call ops.dispatch() w/ rq lock
dropped. That may make the whole dispatch path simpler and the behavior
neater too. What do you think?
Thanks.
--
tejun
Powered by blists - more mailing lists