lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 1 Aug 2023 14:58:11 +0200
From:   Michal Hocko <mhocko@...e.com>
To:     Douglas Anderson <dianders@...omium.org>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Petr Mladek <pmladek@...e.com>,
        kernel test robot <lkp@...el.com>,
        Lecopzer Chen <lecopzer.chen@...iatek.com>,
        Pingfan Liu <kernelfans@...il.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] watchdog/hardlockup: Avoid large stack frames in
 watchdog_hardlockup_check()

On Mon 31-07-23 09:17:59, Douglas Anderson wrote:
> After commit 77c12fc95980 ("watchdog/hardlockup: add a "cpu" param to
> watchdog_hardlockup_check()") we started storing a `struct cpumask` on
> the stack in watchdog_hardlockup_check(). On systems with
> CONFIG_NR_CPUS set to 8192 this takes up 1K on the stack. That
> triggers warnings with `CONFIG_FRAME_WARN` set to 1024.
> 
> Instead of putting this `struct cpumask` on the stack, let's declare
> it as `static`. This has the downside of taking up 1K of memory all
> the time on systems with `CONFIG_NR_CPUS` to 8192, but on systems with
> smaller `CONFIG_NR_CPUS` it's not much emory (with 128 CPUs it's only
> 16 bytes of memory). Presumably anyone building a system with
> `CONFIG_NR_CPUS=8192` can afford the extra 1K of memory.
> 
> NOTE: as part of this change, we no longer check the return value of
> trigger_single_cpu_backtrace(). While we could do this and only call
> cpumask_clear_cpu() if trigger_single_cpu_backtrace() didn't fail,
> that's probably not worth it. There's no reason to believe that
> trigger_cpumask_backtrace() will succeed at backtracing the CPU when
> trigger_single_cpu_backtrace() failed.
> 
> Alternatives considered:
> - Use kmalloc with GFP_ATOMIC to allocate. I decided against this
>   since relying on kmalloc when the system is hard locked up seems
>   like a bad idea.
> - Change the arch_trigger_cpumask_backtrace() across all architectures
>   to take an extra parameter to get the needed behavior. This seems
>   like a lot of churn for a small savings.
> 
> Fixes: 77c12fc95980 ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()")
> Reported-by: kernel test robot <lkp@...el.com>
> Closes: https://lore.kernel.org/r/202307310955.pLZDhpnl-lkp@intel.com
> Signed-off-by: Douglas Anderson <dianders@...omium.org>
> ---
> 
>  kernel/watchdog.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index be38276a365f..19db2357969a 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -151,9 +151,6 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
>  	 */
>  	if (is_hardlockup(cpu)) {
>  		unsigned int this_cpu = smp_processor_id();
> -		struct cpumask backtrace_mask;
> -
> -		cpumask_copy(&backtrace_mask, cpu_online_mask);
>  
>  		/* Only print hardlockups once. */
>  		if (per_cpu(watchdog_hardlockup_warned, cpu))
> @@ -167,10 +164,8 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
>  				show_regs(regs);
>  			else
>  				dump_stack();
> -			cpumask_clear_cpu(cpu, &backtrace_mask);
>  		} else {
> -			if (trigger_single_cpu_backtrace(cpu))
> -				cpumask_clear_cpu(cpu, &backtrace_mask);
> +			trigger_single_cpu_backtrace(cpu);
>  		}
>  
>  		/*
> @@ -178,8 +173,13 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
>  		 * hardlockups generating interleaving traces
>  		 */
>  		if (sysctl_hardlockup_all_cpu_backtrace &&
> -		    !test_and_set_bit(0, &watchdog_hardlockup_all_cpu_dumped))
> +		    !test_and_set_bit(0, &watchdog_hardlockup_all_cpu_dumped)) {
> +			static struct cpumask backtrace_mask;
> +
> +			cpumask_copy(&backtrace_mask, cpu_online_mask);
> +			cpumask_clear_cpu(cpu, &backtrace_mask);
>  			trigger_cpumask_backtrace(&backtrace_mask);

This looks rather wasteful to just copy the cpumask over to
backtrace_mask in nmi_trigger_cpumask_backtrace (which all but sparc
arches do AFAICS).

Would it be possible to use arch_trigger_cpumask_backtrace(cpu_online_mask, false)
and special case cpu != this_cpu && sysctl_hardlockup_all_cpu_backtrace?

> +		}
>  
>  		if (hardlockup_panic)
>  			nmi_panic(regs, "Hard LOCKUP");
> -- 
> 2.41.0.487.g6d72f3e995-goog
> 

-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ