lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250818221300.2948078-1-jstultz@google.com>
Date: Mon, 18 Aug 2025 22:12:50 +0000
From: John Stultz <jstultz@...gle.com>
To: LKML <linux-kernel@...r.kernel.org>
Cc: John Stultz <jstultz@...gle.com>, Joel Fernandes <joelagnelf@...dia.com>, 
	Qais Yousef <qyousef@...alina.io>, Ingo Molnar <mingo@...hat.com>, 
	Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>, 
	Vincent Guittot <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>, 
	Valentin Schneider <vschneid@...hat.com>, Steven Rostedt <rostedt@...dmis.org>, 
	Ben Segall <bsegall@...gle.com>, Zimuzo Ezeozue <zezeozue@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Will Deacon <will@...nel.org>, Waiman Long <longman@...hat.com>, Boqun Feng <boqun.feng@...il.com>, 
	"Paul E. McKenney" <paulmck@...nel.org>, Metin Kaya <Metin.Kaya@....com>, 
	Xuewen Yan <xuewen.yan94@...il.com>, K Prateek Nayak <kprateek.nayak@....com>, 
	Thomas Gleixner <tglx@...utronix.de>, Daniel Lezcano <daniel.lezcano@...aro.org>, 
	Suleiman Souhlal <suleiman@...gle.com>, kuyo chang <kuyo.chang@...iatek.com>, hupu <hupu.gm@...il.com>, 
	kernel-team@...roid.com
Subject: [PATCH v21 0/6] Donor Migration for Proxy Execution (v21)

Hey All,

I wanted to continue pushing for feedback on the next chunk of
the series: Donor Migration

I had initially planned to resend v20 last week, but in working
up the fix[1] for the warning issue that cropped up after single
rq proxying landed in 6.17-rc, it got me thinking a bit more
about the ww_mutex paths.

As part of this chunk, I previously had logic where the ww_mutex
paths took the blocked_lock of the task it was waking (either
the lock waiter->task or owner), but in a context from
__mutex_lock_common() where we already held the
current->block_lock. This required using the spin_lock_nested()
annotation to keep lockdep happy, and I was leaning on the logic
that there is an implied order between running current and the
existing not-running lock waiters, which should avoid loops. In
the wound case, there is also an order used if the owners
context is younger, which sounded likely to avoid loops.

However, after thinking more about the wound case where we are
wounding a lock owner, since that owner is not waiting and could
be trying to acquire a mutex current owns, I couldn’t quite
convince myself we couldn’t get into a ABBA style deadlock with
the nested blocked_lock accesses (though, I’ve not been able to
contrive it to happen, but that doesn’t prove anything).

So the main difference in v21 is reworking of how we hold the
blocked_lock in the mutex_lock_common() code, reducing it so we
don’t call into ww_mutex paths while holding it. The
lock->waiter_lock still serializes things at top level, but the
blocked_lock isn’t held completely in parallel under that, and
is focused on its purpose of protecting the blocked_on,
blocked_on_state and similar proxy-related values in the task
struct.

I also did some cleanups to be more consistent in how the
blocked_on_state is handled. I had a few spots previously where
I was cheating and just set the value instead of going through
the helpers. And sure enough, in fixing those I realized there
were a few spots where I wasn’t always holding the right
blocked_lock, so some minor rework helped clean that up.

I’m trying to submit this larger work in smallish digestible
pieces, so in this portion of the series, I’m only submitting
for review and consideration the logic that allows us to do
donor(blocked waiter) migration, allowing us to proxy-execute
lock owners that might be on other cpu runqueues. This requires
some additional changes to locking and extra state tracking to
ensure we don’t accidentally run a migrated donor on a cpu it
isn’t affined to, as well as some extra handling to deal with
balance callback state that needs to be reset when we decide to
pick a different task after doing donor migration.

I’d love to get some feedback on any place where these patches
are confusing, or could use additional clarifications.

Also you can find the full proxy-exec series here:
  https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v21-6.17-rc2/
  https://github.com/johnstultz-work/linux-dev.git proxy-exec-v21-6.17-rc2

Issues still to address with the full series:
* Need to sort out what is needed for sched_ext to be ok with
  proxy-execution enabled. This is my next priority.

* K Prateek Nayak did some testing about a bit over a year ago
  with an earlier version of the full series and saw ~3-5%
  regressions in some cases. Need to re-evaluate this with the
  proxy-migration avoidance optimization Suleiman suggested
  having now been implemented.

* The chain migration functionality needs further iterations and
  better validation to ensure it truly maintains the RT/DL load
  balancing invariants (despite this being broken in vanilla
  upstream with RT_PUSH_IPI currently)

Future work:
* Expand to other locking primitives: Suleiman is looking at
  rw_semaphores, as that is another common source of priority
  inversion. Figuring out pi-futexes would be good too.
* Eventually: Work to replace rt_mutexes and get things happy
  with PREEMPT_RT

I’d really appreciate any feedback or review thoughts on the
full series as well. I’m trying to keep the chunks small,
reviewable and iteratively testable, but if you have any
suggestions on how to improve the series, I’m all ears.

Credit/Disclaimer:
—--------------------
As always, this Proxy Execution series has a long history with
lots of developers that deserve credit: 

First described in a paper[2] by Watkins, Straub, Niehaus, then
from patches from Peter Zijlstra, extended with lots of work by
Juri Lelli, Valentin Schneider, and Connor O'Brien. (and thank
you to Steven Rostedt for providing additional details here!)

So again, many thanks to those above, as all the credit for this
series really is due to them - while the mistakes are likely mine.

Thanks so much!
-john

[1] https://lore.kernel.org/lkml/20250805001026.2247040-1-jstultz@google.com/
[2] https://static.lwn.net/images/conf/rtlws11/papers/proc/p38.pdf

Cc: Joel Fernandes <joelagnelf@...dia.com>
Cc: Qais Yousef <qyousef@...alina.io>   
Cc: Ingo Molnar <mingo@...hat.com>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Juri Lelli <juri.lelli@...hat.com>
Cc: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@....com>
Cc: Valentin Schneider <vschneid@...hat.com>
Cc: Steven Rostedt <rostedt@...dmis.org>
Cc: Ben Segall <bsegall@...gle.com>
Cc: Zimuzo Ezeozue <zezeozue@...gle.com>
Cc: Mel Gorman <mgorman@...e.de>
Cc: Will Deacon <will@...nel.org>
Cc: Waiman Long <longman@...hat.com>
Cc: Boqun Feng <boqun.feng@...il.com>
Cc: "Paul E. McKenney" <paulmck@...nel.org>
Cc: Metin Kaya <Metin.Kaya@....com>
Cc: Xuewen Yan <xuewen.yan94@...il.com>
Cc: K Prateek Nayak <kprateek.nayak@....com>
Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: Daniel Lezcano <daniel.lezcano@...aro.org>
Cc: Suleiman Souhlal <suleiman@...gle.com>
Cc: kuyo chang <kuyo.chang@...iatek.com>
Cc: hupu <hupu.gm@...il.com>
Cc: kernel-team@...roid.com

John Stultz (5):
  locking: Add task::blocked_lock to serialize blocked_on state
  sched/locking: Add blocked_on_state to provide necessary tri-state for
    proxy return-migration
  sched: Add logic to zap balance callbacks if we pick again
  sched: Handle blocked-waiter migration (and return migration)
  sched: Migrate whole chain in proxy_migrate_task()

Peter Zijlstra (1):
  sched: Add blocked_donor link to task for smarter mutex handoffs

 include/linux/sched.h        | 120 ++++++++-----
 init/init_task.c             |   4 +
 kernel/fork.c                |   4 +
 kernel/locking/mutex-debug.c |   4 +-
 kernel/locking/mutex.c       |  83 +++++++--
 kernel/locking/ww_mutex.h    |  20 +--
 kernel/sched/core.c          | 329 +++++++++++++++++++++++++++++++++--
 kernel/sched/fair.c          |   3 +-
 kernel/sched/sched.h         |   2 +-
 9 files changed, 473 insertions(+), 96 deletions(-)

-- 
2.51.0.rc1.167.g924127e9c0-goog


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ