lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e216e4ae-b882-454d-be8f-24f21a3549d9@linux.dev>
Date: Wed, 17 Dec 2025 14:27:22 +0800
From: Lance Yang <lance.yang@...ux.dev>
To: lirongqing <lirongqing@...du.com>
Cc: Nicholas Piggin <npiggin@...il.com>, Christophe Leroy
 <chleroy@...nel.org>, Martin KaFai Lau <martin.lau@...ux.dev>,
 Eduard Zingerman <eddyz87@...il.com>, Song Liu <song@...nel.org>,
 Yonghong Song <yonghong.song@...ux.dev>,
 John Fastabend <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>,
 Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>,
 Jiri Olsa <jolsa@...nel.org>, linux-doc@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
 linux-aspeed@...ts.ozlabs.org, linux-openrisc@...r.kernel.org,
 linuxppc-dev@...ts.ozlabs.org, dri-devel@...ts.freedesktop.org,
 bpf@...r.kernel.org, linux-kselftest@...r.kernel.org,
 wireguard@...ts.zx2c4.com, netdev@...r.kernel.org,
 Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] watchdog: softlockup: panic when lockup duration exceeds
 N thresholds



On 2025/12/16 15:45, lirongqing wrote:
> From: Li RongQing <lirongqing@...du.com>
> 
> The softlockup_panic sysctl is currently a binary option: panic immediately
> or never panic on soft lockups.
> 
> Panicking on any soft lockup, regardless of duration, can be overly
> aggressive for brief stalls that may be caused by legitimate operations.
> Conversely, never panicking may allow severe system hangs to persist
> undetected.
> 
> Extend softlockup_panic to accept an integer threshold, allowing the kernel
> to panic only when the normalized lockup duration exceeds N watchdog
> threshold periods. This provides finer-grained control to distinguish
> between transient delays and persistent system failures.
> 
> The accepted values are:
> - 0: Don't panic (unchanged)
> - 1: Panic when duration >= 1 * threshold (20s default, original behavior)
> - N > 1: Panic when duration >= N * threshold (e.g., 2 = 40s, 3 = 60s.)
> 
> The original behavior is preserved for values 0 and 1, maintaining full
> backward compatibility while allowing systems to tolerate brief lockups
> while still catching severe, persistent hangs.

Thanks! Just a couple of minor things below ;)

> 
> Signed-off-by: Li RongQing <lirongqing@...du.com>
> ---
>   Documentation/admin-guide/kernel-parameters.txt      | 10 +++++-----
>   arch/arm/configs/aspeed_g5_defconfig                 |  2 +-
>   arch/arm/configs/pxa3xx_defconfig                    |  2 +-
>   arch/openrisc/configs/or1klitex_defconfig            |  2 +-
>   arch/powerpc/configs/skiroot_defconfig               |  2 +-
>   drivers/gpu/drm/ci/arm.config                        |  2 +-
>   drivers/gpu/drm/ci/arm64.config                      |  2 +-
>   drivers/gpu/drm/ci/x86_64.config                     |  2 +-
>   kernel/watchdog.c                                    |  8 +++++---
>   lib/Kconfig.debug                                    | 13 +++++++------
>   tools/testing/selftests/bpf/config                   |  2 +-
>   tools/testing/selftests/wireguard/qemu/kernel.config |  2 +-
>   12 files changed, 26 insertions(+), 23 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index a8d0afd..27c5f96 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -6934,12 +6934,12 @@ Kernel parameters
>   
>   	softlockup_panic=
>   			[KNL] Should the soft-lockup detector generate panics.
> -			Format: 0 | 1
> +			Format: <int>
>   
> -			A value of 1 instructs the soft-lockup detector
> -			to panic the machine when a soft-lockup occurs. It is
> -			also controlled by the kernel.softlockup_panic sysctl
> -			and CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC, which is the
> +			A value of non-zero instructs the soft-lockup detector
> +			to panic the machine when a soft-lockup duration exceeds
> +			N thresholds. It is also controlled by the kernel.softlockup_panic
> +			sysctl and CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC, which is the
>   			respective build-time switch to that functionality.

Seems like kernel/configs/debug.config still has the old format
"# CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set" ...

Should be updated to "CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=0", right?

>   
>   	softlockup_all_cpu_backtrace=
> diff --git a/arch/arm/configs/aspeed_g5_defconfig b/arch/arm/configs/aspeed_g5_defconfig
> index 2e6ea13..ec558e5 100644
> --- a/arch/arm/configs/aspeed_g5_defconfig
> +++ b/arch/arm/configs/aspeed_g5_defconfig
> @@ -306,7 +306,7 @@ CONFIG_SCHED_STACK_END_CHECK=y
>   CONFIG_PANIC_ON_OOPS=y
>   CONFIG_PANIC_TIMEOUT=-1
>   CONFIG_SOFTLOCKUP_DETECTOR=y
> -CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
> +CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=1
>   CONFIG_BOOTPARAM_HUNG_TASK_PANIC=1
>   CONFIG_WQ_WATCHDOG=y
>   # CONFIG_SCHED_DEBUG is not set
> diff --git a/arch/arm/configs/pxa3xx_defconfig b/arch/arm/configs/pxa3xx_defconfig
> index 07d422f..fb272e3 100644
> --- a/arch/arm/configs/pxa3xx_defconfig
> +++ b/arch/arm/configs/pxa3xx_defconfig
> @@ -100,7 +100,7 @@ CONFIG_PRINTK_TIME=y
>   CONFIG_DEBUG_KERNEL=y
>   CONFIG_MAGIC_SYSRQ=y
>   CONFIG_DEBUG_SHIRQ=y
> -CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
> +CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=1
>   # CONFIG_SCHED_DEBUG is not set
>   CONFIG_DEBUG_SPINLOCK=y
>   CONFIG_DEBUG_SPINLOCK_SLEEP=y
> diff --git a/arch/openrisc/configs/or1klitex_defconfig b/arch/openrisc/configs/or1klitex_defconfig
> index fb1eb9a..984b0e3 100644
> --- a/arch/openrisc/configs/or1klitex_defconfig
> +++ b/arch/openrisc/configs/or1klitex_defconfig
> @@ -52,5 +52,5 @@ CONFIG_LSM="lockdown,yama,loadpin,safesetid,integrity,bpf"
>   CONFIG_PRINTK_TIME=y
>   CONFIG_PANIC_ON_OOPS=y
>   CONFIG_SOFTLOCKUP_DETECTOR=y
> -CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
> +CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=1
>   CONFIG_BUG_ON_DATA_CORRUPTION=y
> diff --git a/arch/powerpc/configs/skiroot_defconfig b/arch/powerpc/configs/skiroot_defconfig
> index 2b71a6d..a4114fc 100644
> --- a/arch/powerpc/configs/skiroot_defconfig
> +++ b/arch/powerpc/configs/skiroot_defconfig
> @@ -289,7 +289,7 @@ CONFIG_SCHED_STACK_END_CHECK=y
>   CONFIG_DEBUG_STACKOVERFLOW=y
>   CONFIG_PANIC_ON_OOPS=y
>   CONFIG_SOFTLOCKUP_DETECTOR=y
> -CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
> +CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=1
>   CONFIG_HARDLOCKUP_DETECTOR=y
>   CONFIG_BOOTPARAM_HARDLOCKUP_PANIC=y
>   CONFIG_WQ_WATCHDOG=y
> diff --git a/drivers/gpu/drm/ci/arm.config b/drivers/gpu/drm/ci/arm.config
> index 411e814..d7c5167 100644
> --- a/drivers/gpu/drm/ci/arm.config
> +++ b/drivers/gpu/drm/ci/arm.config
> @@ -52,7 +52,7 @@ CONFIG_TMPFS=y
>   CONFIG_PROVE_LOCKING=n
>   CONFIG_DEBUG_LOCKDEP=n
>   CONFIG_SOFTLOCKUP_DETECTOR=n
> -CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=n
> +CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=0
>   
>   CONFIG_FW_LOADER_COMPRESS=y
>   
> diff --git a/drivers/gpu/drm/ci/arm64.config b/drivers/gpu/drm/ci/arm64.config
> index fddfbd4..ea0e307 100644
> --- a/drivers/gpu/drm/ci/arm64.config
> +++ b/drivers/gpu/drm/ci/arm64.config
> @@ -161,7 +161,7 @@ CONFIG_TMPFS=y
>   CONFIG_PROVE_LOCKING=n
>   CONFIG_DEBUG_LOCKDEP=n
>   CONFIG_SOFTLOCKUP_DETECTOR=y
> -CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
> +CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=1
>   
>   CONFIG_DETECT_HUNG_TASK=y
>   
> diff --git a/drivers/gpu/drm/ci/x86_64.config b/drivers/gpu/drm/ci/x86_64.config
> index 8eaba388..7ac98a7 100644
> --- a/drivers/gpu/drm/ci/x86_64.config
> +++ b/drivers/gpu/drm/ci/x86_64.config
> @@ -47,7 +47,7 @@ CONFIG_TMPFS=y
>   CONFIG_PROVE_LOCKING=n
>   CONFIG_DEBUG_LOCKDEP=n
>   CONFIG_SOFTLOCKUP_DETECTOR=y
> -CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
> +CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=1
>   
>   CONFIG_DETECT_HUNG_TASK=y
>   
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 0685e3a..a5fa116 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -363,7 +363,7 @@ static struct cpumask watchdog_allowed_mask __read_mostly;
>   
>   /* Global variables, exported for sysctl */
>   unsigned int __read_mostly softlockup_panic =
> -			IS_ENABLED(CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC);
> +			CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC;
>   
>   static bool softlockup_initialized __read_mostly;
>   static u64 __read_mostly sample_period;
> @@ -879,7 +879,9 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
>   
>   		add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
>   		sys_info(softlockup_si_mask & ~SYS_INFO_ALL_BT);
> -		if (softlockup_panic)
> +		duration = duration / get_softlockup_thresh();

Nit: reusing "duration" here makes things a bit confusing, maybe just
use a temp variable?

	thresh_count = duration / get_softlockup_thresh();

	if (softlockup_panic && thresh_count >= softlockup_panic)
		panic("softlockup: hung tasks");

Cheers,
Lance

> +
> +		if (softlockup_panic && duration >= softlockup_panic)
>   			panic("softlockup: hung tasks");
>   	}
>   
> @@ -1228,7 +1230,7 @@ static const struct ctl_table watchdog_sysctls[] = {
>   		.mode		= 0644,
>   		.proc_handler	= proc_dointvec_minmax,
>   		.extra1		= SYSCTL_ZERO,
> -		.extra2		= SYSCTL_ONE,
> +		.extra2		= SYSCTL_INT_MAX,
>   	},
>   	{
>   		.procname	= "softlockup_sys_info",
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index ba36939..17a7a77 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1110,13 +1110,14 @@ config SOFTLOCKUP_DETECTOR_INTR_STORM
>   	  the CPU stats and the interrupt counts during the "soft lockups".
>   
>   config BOOTPARAM_SOFTLOCKUP_PANIC
> -	bool "Panic (Reboot) On Soft Lockups"
> +	int "Panic (Reboot) On Soft Lockups"
>   	depends on SOFTLOCKUP_DETECTOR
> +	default 0
>   	help
> -	  Say Y here to enable the kernel to panic on "soft lockups",
> -	  which are bugs that cause the kernel to loop in kernel
> -	  mode for more than 20 seconds (configurable using the watchdog_thresh
> -	  sysctl), without giving other tasks a chance to run.
> +	  Set to a non-zero value N to enable the kernel to panic on "soft
> +	  lockups", which are bugs that cause the kernel to loop in kernel
> +	  mode for more than (N * 20 seconds) (configurable using the
> +	  watchdog_thresh sysctl), without giving other tasks a chance to run.
>   
>   	  The panic can be used in combination with panic_timeout,
>   	  to cause the system to reboot automatically after a
> @@ -1124,7 +1125,7 @@ config BOOTPARAM_SOFTLOCKUP_PANIC
>   	  high-availability systems that have uptime guarantees and
>   	  where a lockup must be resolved ASAP.
>   
> -	  Say N if unsure.
> +	  Say 0 if unsure.
>   
>   config HAVE_HARDLOCKUP_DETECTOR_BUDDY
>   	bool
> diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config
> index 558839e..2485538 100644
> --- a/tools/testing/selftests/bpf/config
> +++ b/tools/testing/selftests/bpf/config
> @@ -1,6 +1,6 @@
>   CONFIG_BLK_DEV_LOOP=y
>   CONFIG_BOOTPARAM_HARDLOCKUP_PANIC=y
> -CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
> +CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=1
>   CONFIG_BPF=y
>   CONFIG_BPF_EVENTS=y
>   CONFIG_BPF_JIT=y
> diff --git a/tools/testing/selftests/wireguard/qemu/kernel.config b/tools/testing/selftests/wireguard/qemu/kernel.config
> index 0504c11..bb89d2d 100644
> --- a/tools/testing/selftests/wireguard/qemu/kernel.config
> +++ b/tools/testing/selftests/wireguard/qemu/kernel.config
> @@ -80,7 +80,7 @@ CONFIG_HARDLOCKUP_DETECTOR=y
>   CONFIG_WQ_WATCHDOG=y
>   CONFIG_DETECT_HUNG_TASK=y
>   CONFIG_BOOTPARAM_HARDLOCKUP_PANIC=y
> -CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
> +CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=1
>   CONFIG_BOOTPARAM_HUNG_TASK_PANIC=1
>   CONFIG_PANIC_TIMEOUT=-1
>   CONFIG_STACKTRACE=y


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ