lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Thu, 14 Mar 2024 21:39:45 -0700
From: John Stultz <jstultz@...gle.com>
To: LKML <linux-kernel@...r.kernel.org>
Cc: John Stultz <jstultz@...gle.com>, Joel Fernandes <joelaf@...gle.com>, 
	Qais Yousef <qyousef@...gle.com>, Ingo Molnar <mingo@...hat.com>, 
	Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>, 
	Vincent Guittot <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>, 
	Valentin Schneider <vschneid@...hat.com>, Steven Rostedt <rostedt@...dmis.org>, 
	Ben Segall <bsegall@...gle.com>, Zimuzo Ezeozue <zezeozue@...gle.com>, 
	Youssef Esmat <youssefesmat@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Daniel Bristot de Oliveira <bristot@...hat.com>, Will Deacon <will@...nel.org>, Waiman Long <longman@...hat.com>, 
	Boqun Feng <boqun.feng@...il.com>, "Paul E. McKenney" <paulmck@...nel.org>, 
	Metin Kaya <Metin.Kaya@....com>, Xuewen Yan <xuewen.yan94@...il.com>, 
	K Prateek Nayak <kprateek.nayak@....com>, Thomas Gleixner <tglx@...utronix.de>, kernel-team@...roid.com
Subject: [PATCH v9 0/7] Preparatory changes for Proxy Execution v9

As mentioned last time[1], after previous submissions of the
Proxy Execution series, I got feedback that the patch series was
getting a bit unwieldy to review, and Qais suggested I break out
just the cleanups/preparatory components of the patch series and
submit them on their own in the hope we can start to merge the
less complex bits and discussion can focus on the more
complicated portions afterwards. This so far has not been very
successful, with the submission & RESEND of the v8 preparatory
changes not getting much in the way of review.

Nonetheless, for v9 of this series, I’m again only submitting
those early cleanup/preparatory changes here (which have not
changed since the v8 submissions, but to avoid confusion with the
git branch names, I’m labeling it as v9). In the meantime, I’ve
continued to put a lot of effort into the full series, mostly
focused on polishing the series for correctness, and fixing some
hard to trip races.

If you are interested, the full v9 series, it can be found here:
  https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v9-6.8
  https://github.com/johnstultz-work/linux-dev.git proxy-exec-v9-6.8


New in v9:
(In the git tree. Again, none of the preparatory patches
submitted here have changed since v8)
---------
* Change to force mutex lock handoff when we have a blocked donor
  (preserves optimistic spinning elsewhere, but still prioritizes
  donor when present on unlock)

* Do return migration whenever we’re not on the wake_cpu (should
  address placement concerns brought up earlier by Xuewen Yan)

* Closed hole where we might mark a task as BO_RUNNABLE without
  doing return migration

* Much improved handling of balance callbacks when we need to
  pick_again

* Fixes for cases where we put_prev_task() but left a dangling
  pointer to rq_selected() when deactivating a task (as it could
  then be migrated away while we still have a reference to it),
  by selecting idle before deactivating next.

* Fixes for dangling references to rq->curr (which had been
  put_prev_task’ed)  when we drop rq lock for proxy_migration

* Fixes for ttwu / find_proxy_task() races if the lock owner was
  being return migrated, and ttwu hadn’t yet set_task_cpu() and
  activated it, which allowed that task to be scheduled on two
  cpus at the same time.

* Fix for live-lock between activate_blocked_tasks() and
  proxy_enqueue_on_owner() if activated owner went right back to
  sleep (which also simplifies the locking in
  activate_blocked_tasks())

* Cleanups to avoid locked BO_WAKING->BO_RUNNABLE transition in
  try_to_wake_up() if proxy execution isn't enabled

* Fix for psi_dequeue, as proxy changes assumptions around
  voluntary sleeps.

* Numerous typos, comment improvements, and other fixups
  suggested by Metin

* And more!


Performance:
---------
K Prateek Nayak provided some feedback on the v8 series here[2].
Given the potential extra overhead of doing rq migrations/return
migrations/etc for the proxy case, it’s not completely surprising
a few of K Prateek’s test cases saw ~3-5% regressions, but I’m
hoping to look into this soon to see if we can reduce those
further. The donor mutex handoff in this revision may help some.


Issues still to address:
---------
* The chain migration functionality needs further iterations and
  better validation to ensure it truly maintains the RT/DL load
  balancing invariants.

* CFS load balancing. There was concern that blocked tasks may
  carry forward load (PELT) to the lock owner's CPU, so the CPU
  may look like it is overloaded. Needs investigation.

* The sleeping owner handling (where we deactivate waiting tasks
  and enqueue them onto a list, then reactivate them when the
  owner wakes up) doesn’t feel great. This is in part because
  when we want to activate tasks, we’re already holding a
  task.pi_lock and a rq_lock, just not the locks for the task
  we’re activating, nor the rq we’re enqueuing it onto. So there
  has to be a bit of lock juggling to drop and acquire the right
  locks (in the right order). It feels like there’s got to be a
  better way. Also needs some rework to get rid of the recursion.


Credit/Disclaimer:
—--------------------
As mentioned previously, this Proxy Execution series has a long
history: First described in a paper[3] by Watkins, Straub,
Niehaus, then from patches from Peter Zijlstra, extended with
lots of work by Juri Lelli, Valentin Schneider, and Connor
O'Brien. (and thank you to Steven Rostedt for providing
additional details here!)

So again, many thanks to those above, as all the credit for this
series really is due to them - while the mistakes are likely
mine.

Thanks so much!
-john

[1] https://lore.kernel.org/lkml/20240224001153.2584030-1-jstultz@google.com/
[2] https://lore.kernel.org/lkml/c26251d2-e1bf-e5c7-0636-12ad886e1ea8@amd.com/
[3] https://static.lwn.net/images/conf/rtlws11/papers/proc/p38.pdf

Cc: Joel Fernandes <joelaf@...gle.com>
Cc: Qais Yousef <qyousef@...gle.com>
Cc: Ingo Molnar <mingo@...hat.com>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Juri Lelli <juri.lelli@...hat.com>
Cc: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@....com>
Cc: Valentin Schneider <vschneid@...hat.com>
Cc: Steven Rostedt <rostedt@...dmis.org>
Cc: Ben Segall <bsegall@...gle.com>
Cc: Zimuzo Ezeozue <zezeozue@...gle.com>
Cc: Youssef Esmat <youssefesmat@...gle.com>
Cc: Mel Gorman <mgorman@...e.de>
Cc: Daniel Bristot de Oliveira <bristot@...hat.com>
Cc: Will Deacon <will@...nel.org>
Cc: Waiman Long <longman@...hat.com>
Cc: Boqun Feng <boqun.feng@...il.com>
Cc: "Paul E. McKenney" <paulmck@...nel.org>
Cc: Metin Kaya <Metin.Kaya@....com>
Cc: Xuewen Yan <xuewen.yan94@...il.com>
Cc: K Prateek Nayak <kprateek.nayak@....com>
Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: kernel-team@...roid.com


Connor O'Brien (2):
  sched: Add do_push_task helper
  sched: Consolidate pick_*_task to task_is_pushable helper

John Stultz (1):
  sched: Split out __schedule() deactivate task logic into a helper

Juri Lelli (2):
  locking/mutex: Make mutex::wait_lock irq safe
  locking/mutex: Expose __mutex_owner()

Peter Zijlstra (2):
  locking/mutex: Remove wakeups from under mutex::wait_lock
  sched: Split scheduler and execution contexts

 kernel/locking/mutex.c       |  60 +++++++----------
 kernel/locking/mutex.h       |  25 +++++++
 kernel/locking/rtmutex.c     |  26 +++++---
 kernel/locking/rwbase_rt.c   |   4 +-
 kernel/locking/rwsem.c       |   4 +-
 kernel/locking/spinlock_rt.c |   3 +-
 kernel/locking/ww_mutex.h    |  49 ++++++++------
 kernel/sched/core.c          | 122 +++++++++++++++++++++--------------
 kernel/sched/deadline.c      |  53 ++++++---------
 kernel/sched/fair.c          |  18 +++---
 kernel/sched/rt.c            |  59 +++++++----------
 kernel/sched/sched.h         |  44 ++++++++++++-
 12 files changed, 268 insertions(+), 199 deletions(-)

-- 
2.44.0.291.gc1ea87d7ee-goog


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ