lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250312221147.1865364-1-jstultz@google.com>
Date: Wed, 12 Mar 2025 15:11:30 -0700
From: John Stultz <jstultz@...gle.com>
To: LKML <linux-kernel@...r.kernel.org>
Cc: John Stultz <jstultz@...gle.com>, Joel Fernandes <joelagnelf@...dia.com>, 
	Qais Yousef <qyousef@...alina.io>, Ingo Molnar <mingo@...hat.com>, 
	Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>, 
	Vincent Guittot <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>, 
	Valentin Schneider <vschneid@...hat.com>, Steven Rostedt <rostedt@...dmis.org>, 
	Ben Segall <bsegall@...gle.com>, Zimuzo Ezeozue <zezeozue@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Will Deacon <will@...nel.org>, Waiman Long <longman@...hat.com>, Boqun Feng <boqun.feng@...il.com>, 
	"Paul E. McKenney" <paulmck@...nel.org>, Metin Kaya <Metin.Kaya@....com>, 
	Xuewen Yan <xuewen.yan94@...il.com>, K Prateek Nayak <kprateek.nayak@....com>, 
	Thomas Gleixner <tglx@...utronix.de>, Daniel Lezcano <daniel.lezcano@...aro.org>, 
	Suleiman Souhlal <suleiman@...gle.com>, kernel-team@...roid.com
Subject: [RFC PATCH v15 0/7] Single RunQueue Proxy Execution (v15)

Hey All,

After sending out the previous version of this series and
getting some great feedback from Peter, I was pulled into a few
other directions for a bit. But I’ve been able to get back to
the proxy work the last few weeks and wanted to send this
iteration out in preparation for discussions at OSPM next week.

So here is v15 of the Proxy Execution series, a generalized form
of priority inheritance.

As I’m trying to submit this work in smallish digestible pieces,
in this series, I’m only submitting for review the logic that
allows us to do the proxying if the lock owner is on the same
runqueue as the blocked waiter. Introducing the
CONFIG_SCHED_PROXY_EXEC option and boot-argument, reworking the
task_struct::blocked_on pointer and wrapper functions, the
initial sketch of the find_proxy_task() logic, some fixes for
using split contexts, and finally same-runqueue proxying.

With v15, I’ve tried to address some of Peter’s feedback,
splitting apart some patches so they are easier to review, and
breaking out some functionality that is not yet needed for
single-runqueue proxying, so that it can be introduced later,
closer to where it is necessary.

I’ve also continued working on the rest of the series, which
you can find here:
  https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v15-6.14-rc6/
  https://github.com/johnstultz-work/linux-dev.git proxy-exec-v15-6.14-rc6

New changes in the full series include:
* Having CONFIG_SCHED_PROXY_EXEC depend on EXPERT for now, as
  its use has pretty narrow value until we get to multi-runqueue
  proxying.
* Improved naming consistency and using the guard macro where
  appropriate
* Moving the blocked_on_state logic to later in the series
* Improved comments
* Build fixes for !CONFIG_SMP
* Moving the zap_balance_callback() logic to later in the series
* Fixes for when sched_proxy_exec() is disabled

Issues still to address with the full series:
* Peter suggested an idea that instead of when tasks become
  unblocked, using (blocked_on_state == BO_WAKING) as a guard
  against running proxy-migrated tasks on cpu’s they are not
  affined to, we could dequeue tasks first and then wake them.
  This does look to be cleaner in many ways, but the locking
  rework is significant and I’ve not worked out all the kinks
  with it yet.
* In the full series with proxy migration (and again, for
  clarity not with this same-rq proxying series I’m sending out
  here), I still am using some workarounds to avoid hitting some
  rare cases of what seem to be lost wakeups, where a task was
  marked as BO_WAKING, and the mutex it is blocked on has no
  owner, but the wakeup on the waiter never managed to
  transition it to BO_RUNNABLE. The workarounds handle doing the
  return migration from within find_proxy_task() but I still
  feel that those fixups shouldn’t be necessary, so I suspect
  the mutex unlock or ttwu logic has a race somewhere I’m
  missing.
* One new issue I found with the workarounds I mentioned in the
  previous bullet, is that they can cause warnings during
  cpuhotplug if we try to do manual return-migration to
  task->wake_cpu and that cpu is offline.
* K Prateek Nayak did some testing about a bit over a year ago
  with an earlier version of the series and saw ~3-5% regressions
  in some cases. I’m hoping to look into this soon to see if we
  can reduce those further.
* The chain migration functionality needs further iterations and
  better validation to ensure it truly maintains the RT/DL load
  balancing invariants (despite this being broken in vanilla
  upstream with RT_PUSH_IPI currently)

I’d really appreciate any feedback or review thoughts on this
series. I’m trying to keep the chunks small, reviewable and
iteratively testable, but if you have any suggestions on how to
improve the series, I’m all ears.

Credit/Disclaimer:
—--------------------
As mentioned previously, this Proxy Execution series has a long
history:

First described in a paper[1] by Watkins, Straub, Niehaus, then
from patches from Peter Zijlstra, extended with lots of work by
Juri Lelli, Valentin Schneider, and Connor O'Brien. (and thank
you to Steven Rostedt for providing additional details here!)

So again, many thanks to those above, as all the credit for this
series really is due to them - while the mistakes are likely mine.

Thanks so much!
-john

[1] https://static.lwn.net/images/conf/rtlws11/papers/proc/p38.pdf


Cc: Joel Fernandes <joelagnelf@...dia.com>
Cc: Qais Yousef <qyousef@...alina.io>
Cc: Ingo Molnar <mingo@...hat.com>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Juri Lelli <juri.lelli@...hat.com>
Cc: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@....com>
Cc: Valentin Schneider <vschneid@...hat.com>
Cc: Steven Rostedt <rostedt@...dmis.org>
Cc: Ben Segall <bsegall@...gle.com>
Cc: Zimuzo Ezeozue <zezeozue@...gle.com>
Cc: Mel Gorman <mgorman@...e.de>
Cc: Will Deacon <will@...nel.org>
Cc: Waiman Long <longman@...hat.com>
Cc: Boqun Feng <boqun.feng@...il.com>
Cc: "Paul E. McKenney" <paulmck@...nel.org>
Cc: Metin Kaya <Metin.Kaya@....com>
Cc: Xuewen Yan <xuewen.yan94@...il.com>
Cc: K Prateek Nayak <kprateek.nayak@....com>
Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: Daniel Lezcano <daniel.lezcano@...aro.org>
Cc: Suleiman Souhlal <suleiman@...gle.com>
Cc: kernel-team@...roid.com

John Stultz (3):
  sched: Add CONFIG_SCHED_PROXY_EXEC & boot argument to enable/disable
  sched: Fix runtime accounting w/ split exec & sched contexts
  sched: Add an initial sketch of the find_proxy_task() function

Peter Zijlstra (2):
  locking/mutex: Rework task_struct::blocked_on
  sched: Start blocked_on chain processing in find_proxy_task()

Valentin Schneider (2):
  locking/mutex: Add p->blocked_on wrappers for correctness checks
  sched: Fix proxy/current (push,pull)ability

 .../admin-guide/kernel-parameters.txt         |   5 +
 include/linux/sched.h                         |  62 +++-
 init/Kconfig                                  |  10 +
 kernel/fork.c                                 |   3 +-
 kernel/locking/mutex-debug.c                  |   9 +-
 kernel/locking/mutex.c                        |  11 +
 kernel/locking/mutex.h                        |   3 +-
 kernel/locking/ww_mutex.h                     |  16 +-
 kernel/sched/core.c                           | 266 +++++++++++++++++-
 kernel/sched/fair.c                           |  31 +-
 kernel/sched/rt.c                             |  15 +-
 kernel/sched/sched.h                          |  22 +-
 12 files changed, 423 insertions(+), 30 deletions(-)

-- 
2.49.0.rc0.332.g42c0ae87b1-goog


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ