[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aMNnLenCytO_KEKg@slm.duckdns.org>
Date: Thu, 11 Sep 2025 14:19:57 -1000
From: Tejun Heo <tj@...nel.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: linux-kernel@...r.kernel.org, mingo@...hat.com, juri.lelli@...hat.com,
vincent.guittot@...aro.org, dietmar.eggemann@....com,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
vschneid@...hat.com, longman@...hat.com, hannes@...xchg.org,
mkoutny@...e.com, void@...ifault.com, arighi@...dia.com,
changwoo@...lia.com, cgroups@...r.kernel.org,
sched-ext@...ts.linux.dev, liuwenfang@...or.com, tglx@...utronix.de
Subject: Re: [PATCH 12/14] sched: Add shared runqueue locking to
__task_rq_lock()
Hello,
On Wed, Sep 10, 2025 at 05:44:21PM +0200, Peter Zijlstra wrote:
> @@ -703,17 +703,24 @@ void double_rq_lock(struct rq *rq1, stru
> struct rq *__task_rq_lock(struct task_struct *p, struct rq_flags *rf)
> __acquires(rq->lock)
> {
> + raw_spinlock_t *slock;
> struct rq *rq;
>
> lockdep_assert_held(&p->pi_lock);
>
> for (;;) {
> rq = task_rq(p);
> + slock = p->srq_lock;
> raw_spin_rq_lock(rq);
> - if (likely(rq == task_rq(p) && !task_on_rq_migrating(p))) {
> + if (slock)
> + raw_spin_lock(slock);
> + if (likely(rq == task_rq(p) && !task_on_rq_migrating(p) &&
> + (!slock || p->srq_lock == slock))) {
> rq_pin_lock(rq, rf);
> return rq;
> }
With the !slock condition, the following scenario is possible:
__task_rq_lock()
slock = p->srq_lock; /* NULL */
dispatch_enqueue()
p->srq_lock = &dsq->lock;
enqueue finishes
raw_spin_rq_lock(rq);
rq is the same, $slock is NULL, return
do something assuming p is locked down p gets dispatched to another rq
I'm unclear on when p->srq_lock would be safe to set and clear, so the goal
is that whoever does [__]task_rq_lock() ends up waiting on the dsq lock that
the task is queued on, and if we can exclude other sched operations that
way, we don't have to hold source rq lock when moving the task to another rq
for execution, right?
In the last patch, it's set on dispatch_enqueue() and cleared when the task
leaves the DSQ. Let's consider a simple scenario where a task gets enqueued,
gets put on a non-local DSQ and then dispatched to a local DSQ, Assuming
everything works out and we don't have to lock the source rq for migration,
we'd be depending on task_rq_lock() reliably hitting p->srq_lock to avoid
races, but I'm not sure how this would work. Let's say p is currently
associated with CPU1 on a non-local DSQ w/ p->srq_lock set to its source
DSQ.
pick_task_ext() on CPU0 task property change on CPU1
locks the DSQ
picks p
task_unlink_from_dsq() task_rq_lock();
p->srq_lock = NULL; lock rq on CPU1
p is moved to local DSQ sees p->src_lock == NULL
return
p starts running
anything can happen
proceed with property change
What am I missing?
Thanks.
--
tejun
Powered by blists - more mailing lists