[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f3c78a55-0f09-44ab-8ce0-9658e534564d@amd.com>
Date: Thu, 30 Oct 2025 13:02:23 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: John Stultz <jstultz@...gle.com>, LKML <linux-kernel@...r.kernel.org>
CC: Joel Fernandes <joelagnelf@...dia.com>, Qais Yousef <qyousef@...alina.io>,
	Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, "Juri
 Lelli" <juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>, Valentin Schneider
	<vschneid@...hat.com>, Steven Rostedt <rostedt@...dmis.org>, Ben Segall
	<bsegall@...gle.com>, Zimuzo Ezeozue <zezeozue@...gle.com>, Mel Gorman
	<mgorman@...e.de>, Will Deacon <will@...nel.org>, Waiman Long
	<longman@...hat.com>, Boqun Feng <boqun.feng@...il.com>, "Paul E. McKenney"
	<paulmck@...nel.org>, Metin Kaya <Metin.Kaya@....com>, Xuewen Yan
	<xuewen.yan94@...il.com>, Thomas Gleixner <tglx@...utronix.de>, "Daniel
 Lezcano" <daniel.lezcano@...aro.org>, Suleiman Souhlal <suleiman@...gle.com>,
	kuyo chang <kuyo.chang@...iatek.com>, hupu <hupu.gm@...il.com>,
	<kernel-team@...roid.com>
Subject: Re: [PATCH v23 3/9] sched/locking: Add special
 p->blocked_on==PROXY_WAKING value for proxy return-migration
Hello John,
On 10/30/2025 5:48 AM, John Stultz wrote:
> As we add functionality to proxy execution, we may migrate a
> donor task to a runqueue where it can't run due to cpu affinity.
> Thus, we must be careful to ensure we return-migrate the task
> back to a cpu in its cpumask when it becomes unblocked.
> 
> Peter helpfully provided the following example with pictures:
> "Suppose we have a ww_mutex cycle:
> 
>                   ,-+-* Mutex-1 <-.
>         Task-A ---' |             | ,-- Task-B
>                     `-> Mutex-2 *-+-'
> 
> Where Task-A holds Mutex-1 and tries to acquire Mutex-2, and
> where Task-B holds Mutex-2 and tries to acquire Mutex-1.
> 
> Then the blocked_on->owner chain will go in circles.
> 
>         Task-A  -> Mutex-2
>           ^          |
>           |          v
>         Mutex-1 <- Task-B
> 
> We need two things:
> 
>  - find_proxy_task() to stop iterating the circle;
> 
>  - the woken task to 'unblock' and run, such that it can
>    back-off and re-try the transaction.
> 
> Now, the current code [without this patch] does:
>         __clear_task_blocked_on();
>         wake_q_add();
> 
> And surely clearing ->blocked_on is sufficient to break the
> cycle.
> 
> Suppose it is Task-B that is made to back-off, then we have:
> 
>   Task-A -> Mutex-2 -> Task-B (no further blocked_on)
> 
> and it would attempt to run Task-B. Or worse, it could directly
> pick Task-B and run it, without ever getting into
> find_proxy_task().
> 
> Now, here is a problem because Task-B might not be runnable on
> the CPU it is currently on; and because !task_is_blocked() we
> don't get into the proxy paths, so nobody is going to fix this
> up.
> 
> Ideally we would have dequeued Task-B alongside of clearing
> ->blocked_on, but alas, [the lock ordering prevents us from
> getting the task_rq_lock() and] spoils things."
> 
> Thus we need more than just a binary concept of the task being
> blocked on a mutex or not.
> 
> So allow setting blocked_on to PROXY_WAKING as a special value
> which specifies the task is no longer blocked, but needs to
> be evaluated for return migration *before* it can be run.
Now I can truly appreciate the need for the tri-state with
that updated commit log. Thank you for the detailed explanation.
Feel free to include:
Reviewed-by: K Prateek Nayak <kprateek.nayak@....com>
-- 
Thanks and Regards,
Prateek
> 
> This will then be used in a later patch to handle proxy
> return-migration.
> 
> Signed-off-by: John Stultz <jstultz@...gle.com>
Powered by blists - more mailing lists
 
