[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250915083815.GB3289052@noisy.programming.kicks-ass.net>
Date: Mon, 15 Sep 2025 10:38:15 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Tejun Heo <tj@...nel.org>
Cc: linux-kernel@...r.kernel.org, mingo@...hat.com, juri.lelli@...hat.com,
vincent.guittot@...aro.org, dietmar.eggemann@....com,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
vschneid@...hat.com, longman@...hat.com, hannes@...xchg.org,
mkoutny@...e.com, void@...ifault.com, arighi@...dia.com,
changwoo@...lia.com, cgroups@...r.kernel.org,
sched-ext@...ts.linux.dev, liuwenfang@...or.com, tglx@...utronix.de
Subject: Re: [PATCH 12/14] sched: Add shared runqueue locking to
__task_rq_lock()
On Fri, Sep 12, 2025 at 07:56:21AM -1000, Tejun Heo wrote:
> It *seems* that way to me. There are two other scenarios tho.
>
> - A task can move from a non-local DSQ to another non-local DSQ at any time
> while queued. As this doesn't cause rq migration, we can probably just
> overwrite p->srq_lock to the new one. Need to think about it a bit more.
It can use task_on_rq_migrating(), exactly like 'normal' rq-to-rq
migration:
LOCK src_dsq->lock
p->on_rq = TASK_ON_RQ_MIGRATING;
task_unlink_from_dsq();
UNLOCK src_dsq->lock
LOCK dst_dsq->lock
dispatch_enqueue()
p->on_rq = TASK_ON_RQ_QUEUED;
UNLOCK dst_dsq->lock
Same reasoning as for the pick_task_scx() migration, if it observes
!p->srq_lock, then it must observe MIGRATING and we'll spin-wait until
QUEUED. At which point we'll see the new srq_lock.
> - A task can be queued on a BPF data structure and thus may not be on any
> DSQ. I think this can be handled by adding a raw_spinlock to task_struct
> and treating the task as if it's on its own DSQ by pointing to that one,
> and grabbing that lock when transferring that task from BPF side.
Hmm, and BPF data structures cannot have a lock associated with them?
I'm thinking they must, something is serializing all that.
> So, it *seems* solvable but I'm afraid it's becoming too subtle. How about
> doing something simpler and just add a per-task lock which nests inside rq
> lock which is always grabbed by [__]task_rq_lock() and optionally grabbed by
> sched classes that want to migrate tasks without grabbing the source rq
> lock? That way, we don't need to the lock pointer dancing while achieving
> about the same result. From sched_ext's POV, grabbing that per-task lock is
> likely going to be cheaper than doing the rq lock switching, so it's way
> simpler and nothing gets worse.
I *really* don't like that. Fundamentally a runqueue is 'rich' data
structure. It has a container (list, tree, whatever) but also a pile of
statistics (time, vtime, counts, load-sums, averages). Adding/removing a
task from a runqueue needs all that serialized. A per-task lock simply
cannot do this.
If you've hidden this lock inside BPF such that C cannot access it, then
your abstraction needs fixing. Surely it is possible to have a C DSQ to
mirror whatever the BPF thing does. Add a few helpers for BPF to
create/destroy DSQs (IDs) and a callback to map a task to a DSQ. Then
the C part can use the DSQ lock, and hold it while calling into whatever
BPF.
Additionally, it can sanity check the BPF thing, tasks cannot go
'missing' without C knowing wtf they went -- which is that bypass
problem, no?
Powered by blists - more mailing lists