[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <DFAXGN12DFCL.2AOFRHS9WWB9Q@etsalapatis.com>
Date: Mon, 29 Dec 2025 13:55:26 -0500
From: "Emil Tsalapatis" <emil@...alapatis.com>
To: "Andrea Righi" <arighi@...dia.com>, "Tejun Heo" <tj@...nel.org>
Cc: "David Vernet" <void@...ifault.com>, "Changwoo Min"
<changwoo@...lia.com>, "Daniel Hodges" <hodgesd@...a.com>,
<sched-ext@...ts.linux.dev>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/2] sched_ext: Fix ops.dequeue() semantics
On Mon Dec 29, 2025 at 12:07 PM EST, Andrea Righi wrote:
> Hi Tejun,
>
> On Sun, Dec 28, 2025 at 01:38:01PM -1000, Tejun Heo wrote:
>> Hello again, again.
>>
>> On Sun, Dec 28, 2025 at 01:28:04PM -1000, Tejun Heo wrote:
>> ...
>> > So, please ignore that part. That's non-sense. I still wonder whether we can
>> > create some interlocking between scx_bpf_dsq_insert() and ops.dequeue()
>> > without making hot path slower. I'll think more about it.
>>
>> And we can't create an interlocking between scx_bpf_dsq_insert() and
>> ops.dequeue() without adding extra atomic operations in hot paths. The only
>> thing shared is task rq lock and dispatch path can't do that synchronously.
>> So, yeah, it looks like the best we can do is always letting the BPF sched
>> know and let it figure out locking and whether the task needs to be
>> dequeued from BPF side.
>
> How about setting a flag in deq_flags to distinguish between a "dispatch"
> dequeue vs a real dequeue (due to property changes or other reasons)?
>
> We should be able to pass this information in a reliable way without any
> additional synchronization in the hot paths. This would let schedulers that
> use arena data structures check the flag instead of doing their own
> internal lookups.
>
> And it would also allow us to provide both semantics:
> 1) Catch real dequeues that need special BPF-side actions (check the flag)
> 2) Track all ops.enqueue()/ops.dequeue() pairs for accounting purposes
> (ignore the flag)
>
IMO the extra flag suffices for arena-based queueing, the arena data
structures already have to track the state of the task already:
Even without the flag it should be possible to infer the task is in
in from inside the BPF code. For example, calling .dequeue() while
the task is in an arena queue means the task got dequeued _after_
being dispatched, while calling .dequeue() on a queued task means we are
removing it because of a true dequeue event (e.g. sched_setaffinity()
was called). The only edge case in the logic is if a true dequeue event
happens between .dispatch() and .dequeue(), but a new flag would take
care of that.
> Thanks,
> -Andrea
Powered by blists - more mailing lists