[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250926032931.27663-1-jstultz@google.com>
Date: Fri, 26 Sep 2025 03:29:08 +0000
From: John Stultz <jstultz@...gle.com>
To: LKML <linux-kernel@...r.kernel.org>
Cc: John Stultz <jstultz@...gle.com>, Joel Fernandes <joelagnelf@...dia.com>,
Qais Yousef <qyousef@...alina.io>, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>,
Valentin Schneider <vschneid@...hat.com>, Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Zimuzo Ezeozue <zezeozue@...gle.com>, Mel Gorman <mgorman@...e.de>,
Will Deacon <will@...nel.org>, Waiman Long <longman@...hat.com>, Boqun Feng <boqun.feng@...il.com>,
"Paul E. McKenney" <paulmck@...nel.org>, Metin Kaya <Metin.Kaya@....com>,
Xuewen Yan <xuewen.yan94@...il.com>, K Prateek Nayak <kprateek.nayak@....com>,
Thomas Gleixner <tglx@...utronix.de>, Daniel Lezcano <daniel.lezcano@...aro.org>,
Suleiman Souhlal <suleiman@...gle.com>, kuyo chang <kuyo.chang@...iatek.com>, hupu <hupu.gm@...il.com>,
kernel-team@...roid.com
Subject: [PATCH v22 0/6] Donor Migration for Proxy Execution (v22)
Hey All,
I wanted to continue pushing for feedback on the next chunk of
the series: Donor Migration
This is just the next step for Proxy Execution, to allow us to
migrate blocked donors across runqueues to boost remote lock
owners.
As always, I’m trying to submit this larger work in smallish
digestible pieces, so in this portion of the series, I’m only
submitting for review and consideration the logic that allows us
to do donor(blocked waiter) migration, which requires some
additional changes to locking and extra state tracking to ensure
we don’t accidentally run a migrated donor on a cpu it isn’t
affined to, as well as some extra handling to deal with balance
callback state that needs to be reset when we decide to pick a
different task after doing donor migration.
My last version got a lot of great feedback from K Prateek Nayak,
which while not significantly changing behavior, did have me
reworking and reorganizing quite a bit of code in this series:
* Reworking find_proxy_task() to avoid mixing gotos with guard()
usage. Instead break and switch() on a set action enum.
* Zap callbacks when we resched idle
* Remove unjustified curr != donor check in pick_next_task_fair()
* Simplifications around put_prev_set_next() in the migration
logic
* Reorder functions for readability
* Move a few task_struct elements under #ifdef
CONFIG_SCHED_PROXY_EXEC
* Switch to one-line stubs and other white space and spelling
cleanups.
I’d love to get further feedback on any place where these patches
are confusing, or could use additional clarifications.
Also Suleiman Souhlal and I have been working on some
enhancements to the full Proxy Execution series:
* Suleiman has implemented a first pass at enabling Proxy Exec
on rw_sems! Rw_sems have been another common source of PI
inversion problems, so I’m excited to be able to have the
Proxy Exec approach be able to help solve those issues as
well. More work and validation are required, but it’s very
exciting!
* I’ve been working to allow Proxy Exec to work with sched_ext.
Currently I’ve worked out the crashers I was initially
seeing. However, I find my stress tests tend to eventually
cause problems, though this seems unfortunately the case
without proxy-exec as well, and seems to be due to the missing
dl_server for sched_ext. I need to try to test with Andrea
Righi’s series here:
https://lore.kernel.org/lkml/20250903095008.162049-1-arighi@nvidia.com/
I still have further work to better understand if Proxy
switching the selected task breaks bpf scheduler assumptions
and what might be done about it.
Also you can find the full proxy-exec series here:
https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v22-6.17-rc6
https://github.com/johnstultz-work/linux-dev.git proxy-exec-v22-6.17-rc6
Issues still to address with the full series:
* Continue working to get sched_ext to be ok with
proxy-execution enabled.
* K Prateek Nayak re-did some performance testing with both this
set and the full series, and while the set I’m submitting here
looked ok, the full series did see regressions. I’m working to
reproduce this so I can narrow the issue down.
* The chain migration functionality needs further iterations and
better validation to ensure it truly maintains the RT/DL load
balancing invariants (despite this being broken in vanilla
upstream with RT_PUSH_IPI currently)
Future work:
* Expand to more locking primitives: Figuring out pi-futexes
would be good too.
* Eventually: Work to replace rt_mutexes and get things happy
with PREEMPT_RT
I’d really appreciate any feedback or review thoughts on the
full series as well. I’m trying to keep the chunks small,
reviewable and iteratively testable, but if you have any
suggestions on how to improve the larger series, I’m all ears.
Credit/Disclaimer:
—--------------------
As always, this Proxy Execution series has a long history with
lots of developers that deserve credit:
First described in a paper[2] by Watkins, Straub, Niehaus, then
from patches from Peter Zijlstra, extended with lots of work by
Juri Lelli, Valentin Schneider, and Connor O'Brien. (and thank
you to Steven Rostedt for providing additional details here!)
So again, many thanks to those above, as all the credit for this
series really is due to them - while the mistakes are likely
mine.
Thanks so much!
-john
[1] https://lore.kernel.org/lkml/20250805001026.2247040-1-jstultz@google.com/
[2] https://static.lwn.net/images/conf/rtlws11/papers/proc/p38.pdf
Cc: Joel Fernandes <joelagnelf@...dia.com>
Cc: Qais Yousef <qyousef@...alina.io>
Cc: Ingo Molnar <mingo@...hat.com>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Juri Lelli <juri.lelli@...hat.com>
Cc: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@....com>
Cc: Valentin Schneider <vschneid@...hat.com>
Cc: Steven Rostedt <rostedt@...dmis.org>
Cc: Ben Segall <bsegall@...gle.com>
Cc: Zimuzo Ezeozue <zezeozue@...gle.com>
Cc: Mel Gorman <mgorman@...e.de>
Cc: Will Deacon <will@...nel.org>
Cc: Waiman Long <longman@...hat.com>
Cc: Boqun Feng <boqun.feng@...il.com>
Cc: "Paul E. McKenney" <paulmck@...nel.org>
Cc: Metin Kaya <Metin.Kaya@....com>
Cc: Xuewen Yan <xuewen.yan94@...il.com>
Cc: K Prateek Nayak <kprateek.nayak@....com>
Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: Daniel Lezcano <daniel.lezcano@...aro.org>
Cc: Suleiman Souhlal <suleiman@...gle.com>
Cc: kuyo chang <kuyo.chang@...iatek.com>
Cc: hupu <hupu.gm@...il.com>
Cc: kernel-team@...roid.com
John Stultz (5):
locking: Add task::blocked_lock to serialize blocked_on state
sched/locking: Add blocked_on_state to provide necessary tri-state for
proxy return-migration
sched: Add logic to zap balance callbacks if we pick again
sched: Handle blocked-waiter migration (and return migration)
sched: Migrate whole chain in proxy_migrate_task()
Peter Zijlstra (1):
sched: Add blocked_donor link to task for smarter mutex handoffs
include/linux/sched.h | 130 ++++++++++----
init/init_task.c | 6 +
kernel/fork.c | 7 +-
kernel/locking/mutex-debug.c | 4 +-
kernel/locking/mutex.c | 86 +++++++--
kernel/locking/ww_mutex.h | 20 +--
kernel/sched/core.c | 339 ++++++++++++++++++++++++++++++++---
kernel/sched/sched.h | 6 +-
8 files changed, 507 insertions(+), 91 deletions(-)
--
2.51.0.536.g15c5d4f767-goog
Powered by blists - more mailing lists