[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260126084258.3798129-1-arighi@nvidia.com>
Date: Mon, 26 Jan 2026 09:41:48 +0100
From: Andrea Righi <arighi@...dia.com>
To: Tejun Heo <tj@...nel.org>,
David Vernet <void@...ifault.com>,
Changwoo Min <changwoo@...lia.com>
Cc: Kuba Piecuch <jpiecuch@...gle.com>,
Christian Loehle <christian.loehle@....com>,
Daniel Hodges <hodgesd@...a.com>,
sched-ext@...ts.linux.dev,
linux-kernel@...r.kernel.org
Subject: [PATCHSET v3 sched_ext/for-6.20] sched_ext: Fix ops.dequeue() semantics
The callback ops.dequeue() is provided to let BPF schedulers observe when a
task leaves the scheduler, either because it is dispatched or due to a task
property change. However, this callback is currently unreliable and not
invoked systematically, which can result in missed ops.dequeue() events.
In particular, once a task is removed from the scheduler (whether for
dispatch or due to a property change) the BPF scheduler loses visibility of
the task and the sched_ext core may not always trigger ops.dequeue().
This breaks accurate accounting (i.e., per-DSQ queued runtime sums) and
prevents reliable tracking of task lifecycle transitions.
This patch set fixes the semantics of ops.dequeue(), ensuring that every
ops.enqueue() is balanced by a corresponding ops.dequeue() invocation. In
addition, ops.dequeue() is now properly invoked when tasks are removed from
the sched_ext class, such as on task property changes.
To distinguish between a "regular" dequeue and a property change dequeue a
new dequeue flag is introduced: %SCX_DEQ_SCHED_CHANGE. BPF schedulers can
use this flag to distinguish between regular dispatch dequeues
(%SCX_DEQ_SCHED_CHANGE unset) and property change dequeues
(%SCX_DEQ_SCHED_CHANGE set).
Together, these changes allow BPF schedulers to reliably track task
ownership and maintain accurate accounting.
Changes in v3:
- Rename SCX_DEQ_ASYNC to SCX_DEQ_SCHED_CHANGE
- Handle core-sched dequeues (Kuba)
- Link to v2: https://lore.kernel.org/all/20260121123118.964704-1-arighi@nvidia.com/
Changes in v2:
- Distinguish between "dispatch" dequeues and "property change" dequeues
(flag SCX_DEQ_ASYNC)
- Link to v1: https://lore.kernel.org/all/20251219224450.2537941-1-arighi@nvidia.com
Andrea Righi (2):
sched_ext: Fix ops.dequeue() semantics
selftests/sched_ext: Add test to validate ops.dequeue() semantics
Documentation/scheduler/sched-ext.rst | 33 ++++
include/linux/sched/ext.h | 11 ++
kernel/sched/ext.c | 89 +++++++++-
kernel/sched/ext_internal.h | 7 +
tools/sched_ext/include/scx/enum_defs.autogen.h | 2 +
tools/sched_ext/include/scx/enums.autogen.bpf.h | 2 +
tools/sched_ext/include/scx/enums.autogen.h | 1 +
tools/testing/selftests/sched_ext/Makefile | 1 +
tools/testing/selftests/sched_ext/dequeue.bpf.c | 209 ++++++++++++++++++++++++
tools/testing/selftests/sched_ext/dequeue.c | 182 +++++++++++++++++++++
10 files changed, 534 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/sched_ext/dequeue.bpf.c
create mode 100644 tools/testing/selftests/sched_ext/dequeue.c
Powered by blists - more mailing lists