lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 8 Feb 2022 14:17:03 +0100
From:   Frederic Weisbecker <frederic@...nel.org>
To:     Paul Menzel <pmenzel@...gen.mpg.de>
Cc:     Frederic Weisbecker <fweisbec@...il.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        linuxppc-dev@...ts.ozlabs.org
Subject: Re: ppc64le: `NOHZ tick-stop error: Non-RCU local softirq work is
 pending, handler #20!!!` when turning off SMT

On Tue, Feb 08, 2022 at 08:32:37AM +0100, Paul Menzel wrote:
> Dear Linux folks,
> 
> 
> On the POWER8 server IBM S822LC running Ubuntu 21.10, Linux 5.17-rc1+ built
> with
> 
>     $ grep HZ /boot/config-5.17.0-rc1+
>     CONFIG_NO_HZ_COMMON=y
>     # CONFIG_HZ_PERIODIC is not set
>     CONFIG_NO_HZ_IDLE=y
>     # CONFIG_NO_HZ_FULL is not set
>     CONFIG_NO_HZ=y
>     # CONFIG_HZ_100 is not set
>     CONFIG_HZ_250=y
>     # CONFIG_HZ_300 is not set
>     # CONFIG_HZ_1000 is not set
>     CONFIG_HZ=250
> 
> once warned about a NOHZ tick-stop error, when I executed `sudo
> /usr/sbin/ppc64_cpu --smt=off` (so that KVM would work).

I see, so I assume this sets some CPUs offline, right?

> 
> ```
> $ dmesg
> [    0.000000] Linux version 5.17.0-rc1+
> (pmenzel@...ghafenberlinbrandenburgwillybrandt.molgen.mpg.de) (Ubuntu clang
> version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28 17:13:04 CET 2022
> […]
> [271272.030262] NOHZ tick-stop error: Non-RCU local softirq work is pending,
> handler #20!!!
> [271272.305726] NOHZ tick-stop error: Non-RCU local softirq work is pending,
> handler #20!!!
> [271272.549790] NOHZ tick-stop error: Non-RCU local softirq work is pending,
> handler #20!!!
> [271274.885167] NOHZ tick-stop error: Non-RCU local softirq work is pending,
> handler #20!!!
> [271275.113896] NOHZ tick-stop error: Non-RCU local softirq work is pending,
> handler #20!!!
> [271275.412902] NOHZ tick-stop error: Non-RCU local softirq work is pending,
> handler #20!!!
> [271275.625245] NOHZ tick-stop error: Non-RCU local softirq work is pending,
> handler #20!!!
> [271275.833107] NOHZ tick-stop error: Non-RCU local softirq work is pending,
> handler #20!!!
> [271276.041391] NOHZ tick-stop error: Non-RCU local softirq work is pending,
> handler #20!!!
> [271277.244880] NOHZ tick-stop error: Non-RCU local softirq work is pending,
> handler #20!!!
> ```

That's IRQ_POLL_SOFTIRQ. The problem here is probably that some of these
softirqs are pending even though ksoftirqd has been parked.

I see there is irq_poll_cpu_dead() that migrates the pending queue once
the CPU is finally dead, so this is well handled.

I'm preparing a patch to fix the warning.

Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ