lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 10 Dec 2020 16:46:38 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Frederic Weisbecker <frederic@...nel.org>
Cc:     linux-kernel@...r.kernel.org
Subject: Re: NOHZ tick-stop error: Non-RCU local softirq work is pending

On Fri, Dec 11, 2020 at 01:15:15AM +0100, Frederic Weisbecker wrote:
> On Thu, Dec 10, 2020 at 01:17:56PM -0800, Paul E. McKenney wrote:
> > And please see attached.  Lots of output, in fact, enough that it
> > was still dumping when the second instance happened.
> 
> Thanks!
> 
> So the issue is that ksoftirqd is parked on CPU down with vectors
> still pending. Either:
> 
> 1) Ksoftirqd has exited because it has too many to process and it has
>    exceeded the time limit, but then it parks, leaving the rest unhandled.
> 
> 2) Ksoftirqd has completed its work but something has raised a softirq
>    after it got parked.
> 
> Can you run the following (on top of the previous patch and boot options)
> so that we see if (and what) it still triggers (in which case we should be in 2)  ).

Thank you!  I have started it up.

> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index 09229ad82209..7d558cb7a037 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -650,7 +650,9 @@ static void run_ksoftirqd(unsigned int cpu)
>  		 * We can safely run softirq on inline stack, as we are not deep
>  		 * in the task stack here.
>  		 */
> -		__do_softirq();
> +		do {
> +			__do_softirq();
> +		} while (kthread_should_park() && local_softirq_pending());
>  		local_irq_enable();
>  		cond_resched();
>  		return;

Huh.  I guess that self-propagating timers, RCU callbacks, and the
like are non-problems because they cannot retrigger while interrupts
are disabled?  But can these things reappear just after the
local_irq_enable()?

In the case of RCU, softirq would need to run on this CPU, which it won't,
so we are good in that case.  (Any stranded callbacks will be requeued
onto some other CPU later in the CPU-hotplug offline processing.)

							Thanx, Paul

> Thanks!

Powered by blists - more mailing lists