lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250415193100.562d55b34851e8c9058c5658@linux-foundation.org>
Date: Tue, 15 Apr 2025 19:31:00 -0700
From: Andrew Morton <akpm@...ux-foundation.org>
To: Luo Gengkun <luogengkun@...weicloud.com>
Cc: joel.granados@...nel.org, song@...nel.org, dianders@...omium.org,
 tglx@...utronix.de, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] watchdog: Fix watchdog may detect false positive of
 softlockup

On Wed, 16 Apr 2025 01:39:22 +0000 Luo Gengkun <luogengkun@...weicloud.com> wrote:

> The watchdog may dectect false positive of softlockup because of stop
> softlockup after update watchdog_thresh. The problem can be described as
> follow:
> 
>  # We asuume previous watchdog_thresh is 60, so the timer is coming every
>  # 24s.
> echo 10 > /proc/sys/kernel/watchdog_thresh (User space)
> |
> +------>+ update watchdog_thresh (We are in kernel now)
> 	|
> 	|
> 	+------>+ watchdog hrtimer (irq context: detect softlockup)
> 		|
> 		|
> 	+-------+
> 	|
> 	|
> 	+ softlockup_stop_all
> 
> As showed above, there is a window between update watchdog_thresh and
> softlockup_stop_all. During this window, if a timer is coming, a false
> positive of softlockup will happen. To fix this problem, use a shadow
> variable to store the new value and write back to watchdog_thresh after
> softlockup_stop_all.
> 

Changelog is a bit hard to follow.  I asked gemini.google.com to clean
it up and it produced this:

: The watchdog may detect a false positive softlockup due to stopping the
: softlockup detection after updating `watchdog_thresh`.  The problem can
: be described as follows:
: 
: ```
: # Assume the previous watchdog_thresh is 60, so the timer triggers every 24 seconds.
: echo 10 > /proc/sys/kernel/watchdog_thresh (User space)
: |
: +------>+ Update watchdog_thresh (Kernel space)
: 	|
: 	|
: 	+------>+ Watchdog hrtimer (irq context: detect softlockup)
: 		|
: 		|
: 	+-------+
: 	|
: 	|
: 	+ softlockup_stop_all
: ```
: 
: As shown above, there is a window between updating `watchdog_thresh`
: and `softlockup_stop_all`.  During this window, if a timer triggers, a
: false positive softlockup can occur.  To fix this problem, a shadow
: variable should be used to store the new value, and this value should
: be written back to `watchdog_thresh` only after `softlockup_stop_all`
: has completed.  

I don't know how accurate this is - please check&fix it and consider
incorporating the result?

> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -47,6 +47,7 @@ int __read_mostly watchdog_user_enabled = 1;
>  static int __read_mostly watchdog_hardlockup_user_enabled = WATCHDOG_HARDLOCKUP_DEFAULT;
>  static int __read_mostly watchdog_softlockup_user_enabled = 1;
>  int __read_mostly watchdog_thresh = 10;
> +static int __read_mostly watchdog_thresh_shadow;
>  static int __read_mostly watchdog_hardlockup_available;
>  
>  struct cpumask watchdog_cpumask __read_mostly;
> @@ -876,6 +877,7 @@ static void __lockup_detector_reconfigure(void)
>  	watchdog_hardlockup_stop();
>  
>  	softlockup_stop_all();
> +	watchdog_thresh = READ_ONCE(watchdog_thresh_shadow);

I expect a reader of this code will wonder "what's all this
watchdog_thresh_shadow stuff".  Can you please add a few small comments
to explain why we're doing this?



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ