[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240625114249.289014-1-npiggin@gmail.com>
Date: Tue, 25 Jun 2024 21:42:43 +1000
From: Nicholas Piggin <npiggin@...il.com>
To: Tejun Heo <tj@...nel.org>
Cc: Nicholas Piggin <npiggin@...il.com>,
"Paul E . McKenney" <paulmck@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Lai Jiangshan <jiangshanlai@...il.com>,
Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
linux-kernel@...r.kernel.org
Subject: [PATCH 0/4] Fix scalability problem in workqueue watchdog touch caused by stop_machine
Here are a few patches to fix a lockup caused by very slow progress due
to a scalability problem in workqueue watchdog touch being hammered by
thousands of CPUs in multi_cpu_stop. Patch 2 is the fix.
I did notice when making a microbenchmark reproducer that the RCU call
was actually also causing slowdowns. Not nearly so bad as the workqueue
touch, but workqueue queueing of dummy jobs slowed down by a factor of
several times when lots of other CPUs were making
rcu_momentary_dyntick_idle() calls. So I did the stop_machine patches to
reduce that. So those patches 3,4 are independent of the first two and
can go in any order.
Thanks,
Nick
Nicholas Piggin (4):
workqueue: wq_watchdog_touch is always called with valid CPU
workqueue: Improve scalability of workqueue watchdog touch
stop_machine: Rearrange multi_cpu_stop state machine loop
stop_machine: Add a delay between multi_cpu_stop touching watchdogs
kernel/stop_machine.c | 31 +++++++++++++++++++++++--------
kernel/workqueue.c | 12 ++++++++++--
2 files changed, 33 insertions(+), 10 deletions(-)
--
2.45.1
Powered by blists - more mailing lists