[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251124223111.3616950-1-jstultz@google.com>
Date: Mon, 24 Nov 2025 22:30:52 +0000
From: John Stultz <jstultz@...gle.com>
To: LKML <linux-kernel@...r.kernel.org>
Cc: John Stultz <jstultz@...gle.com>, Joel Fernandes <joelagnelf@...dia.com>,
Qais Yousef <qyousef@...alina.io>, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>,
Valentin Schneider <vschneid@...hat.com>, Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Zimuzo Ezeozue <zezeozue@...gle.com>, Mel Gorman <mgorman@...e.de>,
Will Deacon <will@...nel.org>, Waiman Long <longman@...hat.com>, Boqun Feng <boqun.feng@...il.com>,
"Paul E. McKenney" <paulmck@...nel.org>, Metin Kaya <Metin.Kaya@....com>,
Xuewen Yan <xuewen.yan94@...il.com>, K Prateek Nayak <kprateek.nayak@....com>,
Thomas Gleixner <tglx@...utronix.de>, Daniel Lezcano <daniel.lezcano@...aro.org>,
Suleiman Souhlal <suleiman@...gle.com>, kuyo chang <kuyo.chang@...iatek.com>, hupu <hupu.gm@...il.com>,
kernel-team@...roid.com
Subject: [PATCH v24 00/11] Donor Migration for Proxy Execution (v24)
Hey All,
Yet another iteration on the next chunk of the Proxy Exec
series: Donor Migration
This is just the next step for Proxy Execution, to allow us to
migrate blocked donors across runqueues to boost remote lock
owners.
In this portion of the series, I’m only submitting for review
and consideration the logic that allows us to do donor
(blocked waiter) migration, which requires some additional
changes to locking and extra state tracking to ensure we don’t
accidentally run a migrated donor on a cpu it isn’t affined to,
as well as some extra handling to deal with balance callback
state that needs to be reset when we decide to pick a different
task after doing donor migration.
In the last iteration, K Prateek provided some really great
review feedback, so I’ve tried to integrate all of his suggested
cleanups and improvements. Many thanks again to K Prateek!
Additionally, in my continued efforts to make Proxy Execution
and sched_ext play well together, I realized a bug I saw with
sched_ext was actually a larger issue around the sched class
implementations assumptions that the “prev” argument passed in
from __schedule() is stable across rq lock drops. Without Proxy
Exec, “prev” is always “current” and is on the cpu, so this
assumption held, but with Proxy Exec, “prev” is “rq->donor”,
and if the rq lock is dropped, the rq->donor may be woken up on
another cpu and return migrated away, with rq->donor being set
to idle. So I’ve gone through the class schedulers for both
pick_next_task() and prev_balance() and removed the prev
argument. Reworking the functions to sample rq->donor,
particularly after a rq lock drop.
New in this iteration:
* Reworking pick_next_task() and prev_balance() to not pass prev
argument which might go stale across rq lock drops
* Change to avoid null ptr traversal task calls yield when
rq->donor is idle.
* _Lots_ of cleanups and improvements suggested by K Prateek.
* Fix for edge case where select_task_rq() chooses the current
cpu and we don’t call set_task_cpu(), which caused wake_cpu to
go stale
I’d love to get further feedback on any place where these
patches are confusing, or could use additional clarifications.
In the full series, there’s a number of fixes for issues found
enabling and testing with sched_ext, along with another revision
of Suleiman’s rwsem support. I’d appreciate any testing or
comments that folks have with the fully set:
You can find the full Proxy Exec series here:
https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v24-6.18-rc6
https://github.com/johnstultz-work/linux-dev.git proxy-exec-v24-6.18-rc6
Issues still to address with the full series:
* Continue working to get sched_ext to be ok with Proxy
Execution enabled.
* I’ve reproduced the performance regression K Prateek Nayak
found with the full series. I’m hoping to work to understand
and narrow the issue down soon.
* The chain migration functionality needs further iterations
and better validation to ensure it truly maintains the RT/DL
load balancing invariants (despite this being broken in
vanilla upstream with RT_PUSH_IPI currently)
Future work:
* Expand to more locking primitives: Figuring out pi-futexes
would be good, using proxy for Binder PI is something else
we’re exploring.
* Eventually: Work to replace rt_mutexes and get things happy
with PREEMPT_RT
I’d really appreciate any feedback or review thoughts on the
full series as well. I’m trying to keep the chunks small,
reviewable and iteratively testable, but if you have any
suggestions on how to improve the larger series, I’m all ears.
Credit/Disclaimer:
—--------------------
As always, this Proxy Execution series has a long history with
lots of developers that deserve credit:
First described in a paper[1] by Watkins, Straub, Niehaus, then
from patches from Peter Zijlstra, extended with lots of work by
Juri Lelli, Valentin Schneider, and Connor O'Brien. (and thank
you to Steven Rostedt for providing additional details here!).
Thanks also to Joel Fernandes, Dietmar Eggemann, Metin Kaya,
K Prateek Nayak and Suleiman Souhlal for their substantial
review, suggestion, and patch contributions.
So again, many thanks to those above, as all the credit for this
series really is due to them - while the mistakes are surely
mine.
Thanks so much!
-john
[1] https://static.lwn.net/images/conf/rtlws11/papers/proc/p38.pdf
Cc: Joel Fernandes <joelagnelf@...dia.com>
Cc: Qais Yousef <qyousef@...alina.io>
Cc: Ingo Molnar <mingo@...hat.com>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Juri Lelli <juri.lelli@...hat.com>
Cc: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@....com>
Cc: Valentin Schneider <vschneid@...hat.com>
Cc: Steven Rostedt <rostedt@...dmis.org>
Cc: Ben Segall <bsegall@...gle.com>
Cc: Zimuzo Ezeozue <zezeozue@...gle.com>
Cc: Mel Gorman <mgorman@...e.de>
Cc: Will Deacon <will@...nel.org>
Cc: Waiman Long <longman@...hat.com>
Cc: Boqun Feng <boqun.feng@...il.com>
Cc: "Paul E. McKenney" <paulmck@...nel.org>
Cc: Metin Kaya <Metin.Kaya@....com>
Cc: Xuewen Yan <xuewen.yan94@...il.com>
Cc: K Prateek Nayak <kprateek.nayak@....com>
Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: Daniel Lezcano <daniel.lezcano@...aro.org>
Cc: Suleiman Souhlal <suleiman@...gle.com>
Cc: kuyo chang <kuyo.chang@...iatek.com>
Cc: hupu <hupu.gm@...il.com>
Cc: kernel-team@...roid.com
John Stultz (10):
locking: Add task::blocked_lock to serialize blocked_on state
sched: Fix modifying donor->blocked on without proper locking
sched/locking: Add special p->blocked_on==PROXY_WAKING value for proxy
return-migration
sched: Add assert_balance_callbacks_empty helper
sched: Add logic to zap balance callbacks if we pick again
sched: Handle blocked-waiter migration (and return migration)
sched: Rework pick_next_task() and prev_balance() to avoid stale prev
references
sched: Avoid donor->sched_class->yield_task() null traversal
sched: Have try_to_wake_up() handle return-migration for PROXY_WAKING
case
sched: Migrate whole chain in proxy_migrate_task()
Peter Zijlstra (1):
sched: Add blocked_donor link to task for smarter mutex handoffs
include/linux/sched.h | 95 +++++---
init/init_task.c | 5 +
kernel/fork.c | 5 +
kernel/locking/mutex-debug.c | 4 +-
kernel/locking/mutex.c | 82 +++++--
kernel/locking/mutex.h | 6 +
kernel/locking/ww_mutex.h | 16 +-
kernel/sched/core.c | 418 +++++++++++++++++++++++++++++++----
kernel/sched/deadline.c | 8 +-
kernel/sched/ext.c | 8 +-
kernel/sched/fair.c | 15 +-
kernel/sched/idle.c | 2 +-
kernel/sched/rt.c | 8 +-
kernel/sched/sched.h | 17 +-
kernel/sched/stop_task.c | 2 +-
kernel/sched/syscalls.c | 3 +-
16 files changed, 582 insertions(+), 112 deletions(-)
--
2.52.0.487.g5c8c507ade-goog
Powered by blists - more mailing lists