lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250323173857.372390-1-gpiccoli@igalia.com>
Date: Sun, 23 Mar 2025 14:36:24 -0300
From: "Guilherme G. Piccoli" <gpiccoli@...lia.com>
To: linux-kernel@...r.kernel.org
Cc: jstultz@...gle.com,
	tglx@...utronix.de,
	sboyd@...nel.org,
	paulmck@...nel.org,
	feng.tang@...el.com,
	kernel@...ccoli.net,
	kernel-dev@...lia.com,
	"Guilherme G. Piccoli" <gpiccoli@...lia.com>,
	Thadeu Lima de Souza Cascardo <cascardo@...lia.com>
Subject: [PATCH] clocksource: Fix the CPUs' choice in the watchdog percpu verification

Right now, if the clocksource watchdog detects a clocksource skew, it
might perform a percpu checking, for example in the TSC case on x86.
In other words: supposing TSC is detected as unstable by the
clocksource watchdog running at CPU1, as part of marking TSC as
unstable the kernel will also run a check of TSC readings in some CPUs
other than CPU1 to be sure it is synced between them all.

But that check might happen only in some CPUs, not all of them; this
choice is based on the parameter "verify_n_cpus" and in some random
cpumask calculation. So, the watchdog runs such percpu check in
up to "verify_n_cpus" random CPUs among all online CPUs, with the risk
of repeating CPUs (that aren't double checked) in the cpumask random
calculation.

But if "verify_n_cpus" > num_online_cpus(), we could skip the random
calculation and just go ahead and check the clocksource sync between
all online CPUs, without the risk of skipping some CPUs due to
duplicity in the random cpumask calculation. That approach is exactly
what is proposed here.

Tests in a 4 CPU laptop with TSC skew detected led to some cases of
the percpu verification skipping some CPU even with verify_n_cpus=8,
due to the duplicity on random cpumask generation. With this patch,
the issue is fixed.

Suggested-by: Thadeu Lima de Souza Cascardo <cascardo@...lia.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@...lia.com>

---


Hey folks, thanks in advance for reviews / suggestions!

Special thanks to Cascardo for the suggestion of checking
verify_n_cpus against the number of online CPUs - definitely
improved the idea!

I think this could be backported to stable if makes sense;
I can do it, please let me know your opinion.
Cheers,

Guilherme


 kernel/time/clocksource.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 2a7802ec480c..a32732dab27e 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -310,7 +310,7 @@ static void clocksource_verify_choose_cpus(void)
 {
 	int cpu, i, n = verify_n_cpus;
 
-	if (n < 0) {
+	if ((n < 0) || (n >= num_online_cpus())) {
 		/* Check all of the CPUs. */
 		cpumask_copy(&cpus_chosen, cpu_online_mask);
 		cpumask_clear_cpu(smp_processor_id(), &cpus_chosen);
-- 
2.48.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ