lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 11 Dec 2020 01:15:15 +0100
From:   Frederic Weisbecker <frederic@...nel.org>
To:     "Paul E. McKenney" <paulmck@...nel.org>
Cc:     linux-kernel@...r.kernel.org
Subject: Re: NOHZ tick-stop error: Non-RCU local softirq work is pending

On Thu, Dec 10, 2020 at 01:17:56PM -0800, Paul E. McKenney wrote:
> And please see attached.  Lots of output, in fact, enough that it
> was still dumping when the second instance happened.

Thanks!

So the issue is that ksoftirqd is parked on CPU down with vectors
still pending. Either:

1) Ksoftirqd has exited because it has too many to process and it has
   exceeded the time limit, but then it parks, leaving the rest unhandled.

2) Ksoftirqd has completed its work but something has raised a softirq
   after it got parked.

Can you run the following (on top of the previous patch and boot options)
so that we see if (and what) it still triggers (in which case we should be in 2)  ).

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 09229ad82209..7d558cb7a037 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -650,7 +650,9 @@ static void run_ksoftirqd(unsigned int cpu)
 		 * We can safely run softirq on inline stack, as we are not deep
 		 * in the task stack here.
 		 */
-		__do_softirq();
+		do {
+			__do_softirq();
+		} while (kthread_should_park() && local_softirq_pending());
 		local_irq_enable();
 		cond_resched();
 		return;


Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ