[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251027183456.343407-1-steve.wahl@hpe.com>
Date: Mon, 27 Oct 2025 13:34:56 -0500
From: Steve Wahl <steve.wahl@....com>
To: Steve Wahl <steve.wahl@....com>,
Anna-Maria Behnsen <anna-maria@...utronix.de>,
Frederic Weisbecker <frederic@...nel.org>,
Ingo Molnar <mingo@...nel.org>, Thomas Gleixner <tglx@...utronix.de>,
linux-kernel@...r.kernel.org
Cc: Russ Anderson <rja@....com>, Dimitri Sivanich <sivanich@....com>,
Kyle Meyer <kyle.meyer@....com>
Subject: [PATCH v2] tick/sched: Limit non-timekeeper CPUs calling jiffies update
On large NUMA systems, while running a test program that saturates the
inter-processor and inter-NUMA links, acquiring the jiffies_lock can
be very expensive. If the cpu designated to do jiffies updates
(tick_do_timer_cpu) gets delayed and other cpus decide to do the
jiffies update themselves, a large number of them decide to do so at
the same time. The inexpensive check against tick_next_period is far
quicker than actually acquiring the lock, so most of these get in line
to obtain the lock. If obtaining the lock is slow enough, this
spirals into the vast majority of CPUs continuously being stuck
waiting for this lock, just to obtain it and find out that time has
already been updated by another cpu. For example, on one random entry
to kdb by manually-injected NMI, I saw 2912 of 3840 cpus stuck here.
To avoid this, allow only one non-timekeeper CPU to call
tick_do_update_jiffies64() at any given time, resetting ts->stalled
jiffies only if the jiffies update function is actually called.
With this change, manually interrupting the test I find at most two
CPUs in the tick_do_update_jiffies64 function (the timekeeper and one
other).
Signed-off-by: Steve Wahl <steve.wahl@....com>
---
v2: Rewritten to use an atomic to gate non-timekeeping cpus calling the
jiffies update, as suggested by tglx. Title of patch has changed
since trylock is no longer used.
v1 discussion: https://lore.kernel.org/all/20251013150959.298288-1-steve.wahl@hpe.com/
kernel/time/tick-sched.c | 30 ++++++++++++++++++++++++++----
1 file changed, 26 insertions(+), 4 deletions(-)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index c527b421c865..3ff3eb1f90d0 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -201,6 +201,27 @@ static inline void tick_sched_flag_clear(struct tick_sched *ts,
ts->flags &= ~flag;
}
+/*
+ * Allow only one non-timekeeper CPU at a time update jiffies from
+ * the timer tick.
+ *
+ * Returns true if update was run.
+ */
+static bool tick_limited_update_jiffies64(struct tick_sched *ts, ktime_t now)
+{
+ static atomic_t in_progress;
+ int inp;
+
+ inp = atomic_read(&in_progress);
+ if (inp || !atomic_try_cmpxchg(&in_progress, &inp, 1))
+ return false;
+
+ if (ts->last_tick_jiffies == jiffies)
+ tick_do_update_jiffies64(now);
+ atomic_set(&in_progress, 0);
+ return true;
+}
+
#define MAX_STALLED_JIFFIES 5
static void tick_sched_do_timer(struct tick_sched *ts, ktime_t now)
@@ -239,10 +260,11 @@ static void tick_sched_do_timer(struct tick_sched *ts, ktime_t now)
ts->stalled_jiffies = 0;
ts->last_tick_jiffies = READ_ONCE(jiffies);
} else {
- if (++ts->stalled_jiffies == MAX_STALLED_JIFFIES) {
- tick_do_update_jiffies64(now);
- ts->stalled_jiffies = 0;
- ts->last_tick_jiffies = READ_ONCE(jiffies);
+ if (++ts->stalled_jiffies >= MAX_STALLED_JIFFIES) {
+ if (tick_limited_update_jiffies64(ts, now)) {
+ ts->stalled_jiffies = 0;
+ ts->last_tick_jiffies = READ_ONCE(jiffies);
+ }
}
}
--
2.43.0
Powered by blists - more mailing lists