lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20210922143206.GA21336@pathway.suse.cz>
Date:   Wed, 22 Sep 2021 16:32:06 +0200
From:   Petr Mladek <pmladek@...e.com>
To:     Jinhui Guo <guojinhui@...wei.com>
Cc:     akpm@...ux-foundation.org, peterz@...radead.org,
        valentin.schneider@....com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] watchdog/softlockup: Fix softlockup_stop_all() hungtask
 bug

On Wed 2021-09-22 17:52:02, Jinhui Guo wrote:
> If NR_CPUS equal to 1, it would trigger hungtask, it can be
> triggered by follow command:
> 	echo 0 > /proc/sys/kernel/watchdog
> 	echo 1 > /proc/sys/kernel/watchdog
> The hungtask stack:
> 	__schedule
> 	schedule
> 	schedule_timeout
> 	__wait_for_common
> 	softlockup_stop_fn
> 	lockup_detector_reconfigure
> 	proc_watchdog_common
> 	proc_watchdog
> 	proc_sys_call_handler
> 	vfs_write
> 	ksys_write
> The watchdog_allowed_mask is completely cleared when the
> watchdog is disabled. But the macro for_each_cpu() assume
> all masks are "1" when macro NR_CPUS equal to 1. It makes
> watchdog_allowed_mask not work at all.
> 
> Fixes: be45bf5395e0 ("watchdog/softlockup: Fix cpu_stop_queue_work() double-queue bug")
> 
> Cc: Petr Mladek <pmladek@...e.com>
> Reported-by: kernel test robot <lkp@...el.com>
> Signed-off-by: Jinhui Guo <guojinhui@...wei.com>
> ---
>  arch/x86/include/asm/smp.h | 5 +++++
>  include/linux/cpumask.h    | 5 +++--
>  2 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
> index 630ff08532be..f5d3ca5696b3 100644
> --- a/arch/x86/include/asm/smp.h
> +++ b/arch/x86/include/asm/smp.h
> @@ -21,7 +21,12 @@ DECLARE_PER_CPU_READ_MOSTLY(int, cpu_number);
>  
>  static inline struct cpumask *cpu_llc_shared_mask(int cpu)
>  {
> +#ifdef CONFIG_SMP
>  	return per_cpu(cpu_llc_shared_map, cpu);
> +#else
> +	/* cpu_llc_shared_map is not defined while !CONFIG_SMP */
> +	return cpu_all_mask;
> +#endif

This looks dangerous. The cpumask returned can be modified,
for example, in

static void remove_siblinginfo(int cpu)
{
[...]
	for_each_cpu(sibling, cpu_llc_shared_mask(cpu))
		cpumask_clear_cpu(cpu, cpu_llc_shared_mask(sibling));
[...]
	cpumask_clear_cpu(cpu, cpu_sibling_setup_mask);
[...]
}

With you patch, this code would clear "cpu_all_mask" which
is wrong!

We need to fix this another way. I see few possibilities:

1. Define cpu_llc_shared_map even with NR_CPUS == 1.

2. Do not compile the problematic code with NR_CPUS == 1.
   It seems to be needed for CPU hotplug so it does not
   make sense to have it.

3. Define for_each_cpu_optimized() that would behave the same
   way as the non-patched for_each_cpu(). It can be
   used in remove_siblinginfo().

4. Do not patch for_each_cpu(). Instead, add for_each_cpu_safe()
   that would work correctly even with NR_CPUS == 1.
   It can than be used in watchdog.c.


IMHO, the 2nd solution would be the best if the change
is simple.

The 3rd solution is my other favorite. It would keep
the default for_each_cpu() safe. People might use
for_each_cpu_optimized() only when they know what
they are doing. It will be easy to find these
locations.

Best Regards,
Petr

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ