linux-kernel - Re: [PATCH v22 2/6] sched/locking: Add blocked_on_state to provide necessary tri-state for proxy return-migration

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CANDhNCppAhmF5MX9wP06tPD=H4cpKqppHEPuMgxH+Z=cT9YB5Q@mail.gmail.com>
Date: Thu, 16 Oct 2025 15:23:00 -0700
From: John Stultz <jstultz@...gle.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: LKML <linux-kernel@...r.kernel.org>, Joel Fernandes <joelagnelf@...dia.com>, 
	Qais Yousef <qyousef@...alina.io>, Ingo Molnar <mingo@...hat.com>, 
	Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>, 
	Dietmar Eggemann <dietmar.eggemann@....com>, Valentin Schneider <vschneid@...hat.com>, 
	Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, 
	Zimuzo Ezeozue <zezeozue@...gle.com>, Mel Gorman <mgorman@...e.de>, Will Deacon <will@...nel.org>, 
	Waiman Long <longman@...hat.com>, Boqun Feng <boqun.feng@...il.com>, 
	"Paul E. McKenney" <paulmck@...nel.org>, Metin Kaya <Metin.Kaya@....com>, 
	Xuewen Yan <xuewen.yan94@...il.com>, K Prateek Nayak <kprateek.nayak@....com>, 
	Thomas Gleixner <tglx@...utronix.de>, Daniel Lezcano <daniel.lezcano@...aro.org>, 
	Suleiman Souhlal <suleiman@...gle.com>, kuyo chang <kuyo.chang@...iatek.com>, hupu <hupu.gm@...il.com>, 
	kernel-team@...roid.com
Subject: Re: [PATCH v22 2/6] sched/locking: Add blocked_on_state to provide
 necessary tri-state for proxy return-migration

On Mon, Oct 13, 2025 at 7:43 PM John Stultz <jstultz@...gle.com> wrote:
> On Thu, Oct 9, 2025 at 4:43 AM Peter Zijlstra <peterz@...radead.org> wrote:
> >  - I'm confliced on having TTWU fix up PROXY_STOP; strictly not required
> >    I think, but might improve performance -- if so, include numbers in
> >    patch that adds it -- which should be a separate patch from the one
> >    that adds PROXY_STOP.
>
> Ok, I'll work to split that logic out. The nice thing in ttwu is we
> already end up taking the rq lock in ttwu_runnable() when we do the
> dequeue so yeah I expect it would help performance.

So, I thought this wouldn't be hard, but it ends up there's some
subtlety to trying to separate the ttwu changes.

First, I am using PROXY_WAKING instead of PROXY_STOP since it seemed
more clear and aligned to my previous mental model with BO_WAKING.

One of the issues is when we go through the:
  mutex_unlock_slowpath()/ww_mutex_die()/ww_mutex_wound()
  ->  tsk->blocked_on = PROXY_WAKING
      wake_q_add(tsk)
      ...
      wake_up_q()
      ->  wake_up_process()

The wake_up_process() call through try_to_wake_up() will hit the
ttwu_runnable() case and set the task state RUNNING.

Then on the cpu where that task is enqueued:
  __schedule()
  -> find_proxy_task()
     -> if (p->blocked_on == PROXY_WAKING)
           proxy_force_return(rq, p);

In v22, proxy_force_return() logic would block_task(p),
clear_task_blocked_on(p) and then call wake_up_process(p).
https://github.com/johnstultz-work/linux-dev/blob/proxy-exec-v22-6.17-rc6/kernel/sched/core.c#L7117

However, since the task state has already been set to TASK_RUNNING,
the second wakeup ends up short-circuiting at ttwu_state_match(), and
the now blocked task would end up left dequeued forever.

So, I've reworked the proxy_force_return() to be sort of an open coded
try_to_wakeup() to call select_task_rq() to pick the return cpu and
then basically deactivate/activate the task to migrate it over.  It
was nice to reuse block_task() and wake_up_process() previously, but
that wake/block/wake behavior tripping into the dequeued forever issue
worries me that it could be tripped in rare cases previously with my
series (despite having check after ttwu_state_mach() for this case).
So either I'll keep this approach or maybe we should add some extra
checking in ttwu_state_mach() for on_rq before bailing?  Let me know
if you have thoughts there.

Hopefully will have the patches cleaned up and out again soon.

thanks
-john