lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5d68cd3a-8be6-4979-815b-5b09a9a46165@paulmck-laptop>
Date: Sun, 23 Mar 2025 13:40:43 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: "Guilherme G. Piccoli" <gpiccoli@...lia.com>
Cc: linux-kernel@...r.kernel.org, jstultz@...gle.com, tglx@...utronix.de,
	sboyd@...nel.org, feng.tang@...el.com, kernel@...ccoli.net,
	kernel-dev@...lia.com,
	Thadeu Lima de Souza Cascardo <cascardo@...lia.com>
Subject: Re: [PATCH] clocksource: Fix the CPUs' choice in the watchdog percpu
 verification

On Sun, Mar 23, 2025 at 02:36:24PM -0300, Guilherme G. Piccoli wrote:
> Right now, if the clocksource watchdog detects a clocksource skew, it
> might perform a percpu checking, for example in the TSC case on x86.
> In other words: supposing TSC is detected as unstable by the
> clocksource watchdog running at CPU1, as part of marking TSC as
> unstable the kernel will also run a check of TSC readings in some CPUs
> other than CPU1 to be sure it is synced between them all.
> 
> But that check might happen only in some CPUs, not all of them; this
> choice is based on the parameter "verify_n_cpus" and in some random
> cpumask calculation. So, the watchdog runs such percpu check in
> up to "verify_n_cpus" random CPUs among all online CPUs, with the risk
> of repeating CPUs (that aren't double checked) in the cpumask random
> calculation.
> 
> But if "verify_n_cpus" > num_online_cpus(), we could skip the random
> calculation and just go ahead and check the clocksource sync between
> all online CPUs, without the risk of skipping some CPUs due to
> duplicity in the random cpumask calculation. That approach is exactly
> what is proposed here.
> 
> Tests in a 4 CPU laptop with TSC skew detected led to some cases of
> the percpu verification skipping some CPU even with verify_n_cpus=8,
> due to the duplicity on random cpumask generation. With this patch,
> the issue is fixed.
> 
> Suggested-by: Thadeu Lima de Souza Cascardo <cascardo@...lia.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@...lia.com>

Reviewed-by: Paul E. McKenney <paulmck@...nel.org>

> ---
> 
> 
> Hey folks, thanks in advance for reviews / suggestions!
> 
> Special thanks to Cascardo for the suggestion of checking
> verify_n_cpus against the number of online CPUs - definitely
> improved the idea!
> 
> I think this could be backported to stable if makes sense;
> I can do it, please let me know your opinion.
> Cheers,
> 
> Guilherme
> 
> 
>  kernel/time/clocksource.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index 2a7802ec480c..a32732dab27e 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -310,7 +310,7 @@ static void clocksource_verify_choose_cpus(void)
>  {
>  	int cpu, i, n = verify_n_cpus;
>  
> -	if (n < 0) {
> +	if ((n < 0) || (n >= num_online_cpus())) {
>  		/* Check all of the CPUs. */
>  		cpumask_copy(&cpus_chosen, cpu_online_mask);
>  		cpumask_clear_cpu(smp_processor_id(), &cpus_chosen);
> -- 
> 2.48.1
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ