lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251124223111.3616950-1-jstultz@google.com>
Date: Mon, 24 Nov 2025 22:30:52 +0000
From: John Stultz <jstultz@...gle.com>
To: LKML <linux-kernel@...r.kernel.org>
Cc: John Stultz <jstultz@...gle.com>, Joel Fernandes <joelagnelf@...dia.com>, 
	Qais Yousef <qyousef@...alina.io>, Ingo Molnar <mingo@...hat.com>, 
	Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>, 
	Vincent Guittot <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>, 
	Valentin Schneider <vschneid@...hat.com>, Steven Rostedt <rostedt@...dmis.org>, 
	Ben Segall <bsegall@...gle.com>, Zimuzo Ezeozue <zezeozue@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Will Deacon <will@...nel.org>, Waiman Long <longman@...hat.com>, Boqun Feng <boqun.feng@...il.com>, 
	"Paul E. McKenney" <paulmck@...nel.org>, Metin Kaya <Metin.Kaya@....com>, 
	Xuewen Yan <xuewen.yan94@...il.com>, K Prateek Nayak <kprateek.nayak@....com>, 
	Thomas Gleixner <tglx@...utronix.de>, Daniel Lezcano <daniel.lezcano@...aro.org>, 
	Suleiman Souhlal <suleiman@...gle.com>, kuyo chang <kuyo.chang@...iatek.com>, hupu <hupu.gm@...il.com>, 
	kernel-team@...roid.com
Subject: [PATCH v24 00/11] Donor Migration for Proxy Execution (v24)

Hey All,

Yet another iteration on the next chunk of the Proxy Exec
series: Donor Migration

This is just the next step for Proxy Execution, to allow us to
migrate blocked donors across runqueues to boost remote lock
owners.

In this portion of the series, I’m only submitting for review
and consideration the logic that allows us to do donor 
(blocked waiter) migration, which requires some additional
changes to locking and extra state tracking to ensure we don’t
accidentally run a migrated donor on a cpu it isn’t affined to,
as well as some extra handling to deal with balance callback
state that needs to be reset when we decide to pick a different
task after doing donor migration.

In the last iteration, K Prateek provided some really great
review feedback, so I’ve tried to integrate all of his suggested
cleanups and improvements. Many thanks again to K Prateek!

Additionally, in my continued efforts to make Proxy Execution
and sched_ext play well together, I realized a bug I saw with
sched_ext was actually a larger issue around the sched class
implementations assumptions that the “prev” argument passed in
from __schedule() is stable across rq lock drops. Without Proxy
Exec, “prev” is always “current” and is on the cpu, so this
assumption held, but with Proxy Exec,  “prev” is “rq->donor”,
and if the rq lock is dropped, the rq->donor may be woken up on
another cpu and return migrated away, with rq->donor being set
to idle.  So I’ve gone through the class schedulers for both
pick_next_task() and prev_balance() and removed the prev
argument. Reworking the functions to sample rq->donor,
particularly after a rq lock drop.

New in this iteration:
* Reworking pick_next_task() and prev_balance() to not pass prev
  argument which might go stale across rq lock drops 
* Change to avoid null ptr traversal task calls yield when
  rq->donor is idle. 
* _Lots_ of cleanups and improvements suggested by K Prateek.
* Fix for edge case where select_task_rq() chooses the current
  cpu and we don’t call set_task_cpu(), which caused wake_cpu to
  go stale 

I’d love to get further feedback on any place where these
patches are confusing, or could use additional clarifications.

In the full series, there’s a number of fixes for issues found
enabling and testing with sched_ext, along with another revision
of Suleiman’s rwsem support. I’d appreciate any testing or
comments that folks have with the fully set:

You can find the full Proxy Exec series here:
  https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v24-6.18-rc6
  https://github.com/johnstultz-work/linux-dev.git proxy-exec-v24-6.18-rc6

Issues still to address with the full series:
* Continue working to get sched_ext to be ok with Proxy
  Execution enabled.
* I’ve reproduced the performance regression K Prateek Nayak
  found with the full series. I’m hoping to work to understand
   and narrow the issue down soon.
* The chain migration functionality needs further iterations
  and better validation to ensure it truly maintains the RT/DL
  load balancing invariants (despite this being broken in
  vanilla upstream with RT_PUSH_IPI currently)
Future work:
* Expand to more locking primitives: Figuring out pi-futexes
  would be good, using proxy for Binder PI is something else
  we’re exploring.
* Eventually: Work to replace rt_mutexes and get things happy
  with PREEMPT_RT

I’d really appreciate any feedback or review thoughts on the
full series as well. I’m trying to keep the chunks small,
reviewable and iteratively testable, but if you have any
suggestions on how to improve the larger series, I’m all ears.

Credit/Disclaimer:
—--------------------
As always, this Proxy Execution series has a long history with
lots of developers that deserve credit: 

First described in a paper[1] by Watkins, Straub, Niehaus, then
from patches from Peter Zijlstra, extended with lots of work by
Juri Lelli, Valentin Schneider, and Connor O'Brien. (and thank
you to Steven Rostedt for providing additional details here!).
Thanks also to Joel Fernandes, Dietmar Eggemann, Metin Kaya,
K Prateek Nayak and Suleiman Souhlal for their substantial
review, suggestion, and patch contributions.

So again, many thanks to those above, as all the credit for this
series really is due to them - while the mistakes are surely
mine.

Thanks so much!
-john

[1] https://static.lwn.net/images/conf/rtlws11/papers/proc/p38.pdf

Cc: Joel Fernandes <joelagnelf@...dia.com>
Cc: Qais Yousef <qyousef@...alina.io>   
Cc: Ingo Molnar <mingo@...hat.com>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Juri Lelli <juri.lelli@...hat.com>
Cc: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@....com>
Cc: Valentin Schneider <vschneid@...hat.com>
Cc: Steven Rostedt <rostedt@...dmis.org>
Cc: Ben Segall <bsegall@...gle.com>
Cc: Zimuzo Ezeozue <zezeozue@...gle.com>
Cc: Mel Gorman <mgorman@...e.de>
Cc: Will Deacon <will@...nel.org>
Cc: Waiman Long <longman@...hat.com>
Cc: Boqun Feng <boqun.feng@...il.com>
Cc: "Paul E. McKenney" <paulmck@...nel.org>
Cc: Metin Kaya <Metin.Kaya@....com>
Cc: Xuewen Yan <xuewen.yan94@...il.com>
Cc: K Prateek Nayak <kprateek.nayak@....com>
Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: Daniel Lezcano <daniel.lezcano@...aro.org>
Cc: Suleiman Souhlal <suleiman@...gle.com>
Cc: kuyo chang <kuyo.chang@...iatek.com>
Cc: hupu <hupu.gm@...il.com>
Cc: kernel-team@...roid.com


John Stultz (10):
  locking: Add task::blocked_lock to serialize blocked_on state
  sched: Fix modifying donor->blocked on without proper locking
  sched/locking: Add special p->blocked_on==PROXY_WAKING value for proxy
    return-migration
  sched: Add assert_balance_callbacks_empty helper
  sched: Add logic to zap balance callbacks if we pick again
  sched: Handle blocked-waiter migration (and return migration)
  sched: Rework pick_next_task() and prev_balance() to avoid stale prev
    references
  sched: Avoid donor->sched_class->yield_task() null traversal
  sched: Have try_to_wake_up() handle return-migration for PROXY_WAKING
    case
  sched: Migrate whole chain in proxy_migrate_task()

Peter Zijlstra (1):
  sched: Add blocked_donor link to task for smarter mutex handoffs

 include/linux/sched.h        |  95 +++++---
 init/init_task.c             |   5 +
 kernel/fork.c                |   5 +
 kernel/locking/mutex-debug.c |   4 +-
 kernel/locking/mutex.c       |  82 +++++--
 kernel/locking/mutex.h       |   6 +
 kernel/locking/ww_mutex.h    |  16 +-
 kernel/sched/core.c          | 418 +++++++++++++++++++++++++++++++----
 kernel/sched/deadline.c      |   8 +-
 kernel/sched/ext.c           |   8 +-
 kernel/sched/fair.c          |  15 +-
 kernel/sched/idle.c          |   2 +-
 kernel/sched/rt.c            |   8 +-
 kernel/sched/sched.h         |  17 +-
 kernel/sched/stop_task.c     |   2 +-
 kernel/sched/syscalls.c      |   3 +-
 16 files changed, 582 insertions(+), 112 deletions(-)

-- 
2.52.0.487.g5c8c507ade-goog


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ