lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aVG9BBWIQLL8uJrQ@slm.duckdns.org>
Date: Sun, 28 Dec 2025 13:28:04 -1000
From: Tejun Heo <tj@...nel.org>
To: Andrea Righi <arighi@...dia.com>
Cc: David Vernet <void@...ifault.com>, Changwoo Min <changwoo@...lia.com>,
	Emil Tsalapatis <emil@...alapatis.com>,
	Daniel Hodges <hodgesd@...a.com>, sched-ext@...ts.linux.dev,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] sched_ext: Fix ops.dequeue() semantics

Hello,

On Sun, Dec 28, 2025 at 07:19:46AM -1000, Tejun Heo wrote:
> While this works, from the BPF sched's POV, there's no way to tell whether
> an ops.dequeue() call is from the task being actually dequeued or the
> follow-up to the dispatch operation it just did. This won't make much
> difference if ops.dequeue() is just used for accounting purposes, but, a
> scheduler which uses an arena data structure for queueing would likely need
> to perform extra tests to tell whether the task needs to be dequeued from
> the arena side. I *think* hot path (ops.dequeue() following the task's
> dispatch) can be a simple lockless test, so this may be okay, but from API
> POV, it can probably be better.
> 
> The counter interlocking point is scx_bpf_dsq_insert(). If we can
> synchronize scx_bpf_dsq_insert() and dequeue so that ops.dequeue() is not
> called for a successfully inserted task, I think the semantics would be
> neater - an enqueued task is either dispatched or dequeued. Due to the async
> dispatch operation, this likely is difficult to do without adding extra sync
> operations in scx_bpf_dsq_insert(). However, I *think* we may be able to get
> rid of dspc and async inserting if we call ops.dispatch() w/ rq lock
> dropped. That may make the whole dispatch path simpler and the behavior
> neater too. What do you think?

I sat down and went through the code to see whether I was actually making
sense, and I wasn't:

The async dispatch buffering is necessary to avoid lock inversion between rq
lock and whatever locks the BPF scheduler might be using internally. This is
necessary because enqueue path runs with rq lock held. Thus, any lock that
BPF sched uses in tne enqueue path has to nest inside rq lock.

In dispatch, scx_bpf_dsq_insert() is likely to be called with the same BPF
sched side lock held. If we try to do rq lock dancing synchronously, we can
end up trying to grab rq lock while holding BPF side lock leading to
deadlock.

Kernel side has no control over BPF side locking, so the asynchronous
operation is there to side-step the issue. I don't see a good way to make
this synchronous.

So, please ignore that part. That's non-sense. I still wonder whether we can
create some interlocking between scx_bpf_dsq_insert() and ops.dequeue()
without making hot path slower. I'll think more about it.

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ