[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250323173857.372390-1-gpiccoli@igalia.com>
Date: Sun, 23 Mar 2025 14:36:24 -0300
From: "Guilherme G. Piccoli" <gpiccoli@...lia.com>
To: linux-kernel@...r.kernel.org
Cc: jstultz@...gle.com,
tglx@...utronix.de,
sboyd@...nel.org,
paulmck@...nel.org,
feng.tang@...el.com,
kernel@...ccoli.net,
kernel-dev@...lia.com,
"Guilherme G. Piccoli" <gpiccoli@...lia.com>,
Thadeu Lima de Souza Cascardo <cascardo@...lia.com>
Subject: [PATCH] clocksource: Fix the CPUs' choice in the watchdog percpu verification
Right now, if the clocksource watchdog detects a clocksource skew, it
might perform a percpu checking, for example in the TSC case on x86.
In other words: supposing TSC is detected as unstable by the
clocksource watchdog running at CPU1, as part of marking TSC as
unstable the kernel will also run a check of TSC readings in some CPUs
other than CPU1 to be sure it is synced between them all.
But that check might happen only in some CPUs, not all of them; this
choice is based on the parameter "verify_n_cpus" and in some random
cpumask calculation. So, the watchdog runs such percpu check in
up to "verify_n_cpus" random CPUs among all online CPUs, with the risk
of repeating CPUs (that aren't double checked) in the cpumask random
calculation.
But if "verify_n_cpus" > num_online_cpus(), we could skip the random
calculation and just go ahead and check the clocksource sync between
all online CPUs, without the risk of skipping some CPUs due to
duplicity in the random cpumask calculation. That approach is exactly
what is proposed here.
Tests in a 4 CPU laptop with TSC skew detected led to some cases of
the percpu verification skipping some CPU even with verify_n_cpus=8,
due to the duplicity on random cpumask generation. With this patch,
the issue is fixed.
Suggested-by: Thadeu Lima de Souza Cascardo <cascardo@...lia.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@...lia.com>
---
Hey folks, thanks in advance for reviews / suggestions!
Special thanks to Cascardo for the suggestion of checking
verify_n_cpus against the number of online CPUs - definitely
improved the idea!
I think this could be backported to stable if makes sense;
I can do it, please let me know your opinion.
Cheers,
Guilherme
kernel/time/clocksource.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 2a7802ec480c..a32732dab27e 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -310,7 +310,7 @@ static void clocksource_verify_choose_cpus(void)
{
int cpu, i, n = verify_n_cpus;
- if (n < 0) {
+ if ((n < 0) || (n >= num_online_cpus())) {
/* Check all of the CPUs. */
cpumask_copy(&cpus_chosen, cpu_online_mask);
cpumask_clear_cpu(smp_processor_id(), &cpus_chosen);
--
2.48.1
Powered by blists - more mailing lists