lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 24 Aug 2018 16:27:15 +0200
From:   Frederic Weisbecker <frederic@...nel.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Greg KH <gregkh@...uxfoundation.org>,
        Grygorii Strashko <grygorii.strashko@...com>,
        LKML <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...nel.org>,
        Anna-Maria Gleixner <anna-maria@...utronix.de>,
        stable@...r.kernel.org
Subject: Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline
 timer softirq

On Fri, Aug 24, 2018 at 09:01:02AM +0200, Thomas Gleixner wrote:
> On Fri, 24 Aug 2018, Greg KH wrote:
> > On Thu, Aug 23, 2018 at 05:57:06PM -0500, Grygorii Strashko wrote:
> > > This patch was back ported to the Stable linux-4.14.y and It causes regression -
> > >  flood of "NOHZ: local_softirq_pending" messages on all TI boards during boot (NFS boot):
> > > 
> > > [    4.179796] NOHZ: local_softirq_pending 2c2 in sirq 256
> > > [    4.185051] NOHZ: local_softirq_pending 2c2 in sirq 256
> 
> This printout is weird. Did you add something here?
> 
> > > the same is not reproducible with LKML - seems due to changes in tick-sched.c 
> > > __tick_nohz_idle_enter()/tick_nohz_irq_exit().
> > 
> > What changes do you think fixed this?
> > 
> > > I've generated backtrace from  can_stop_idle_tick() (see below) and seems this
> > > patch makes tick_nohz_irq_exit() call unconditional in case of nested interrupt:
> > > 
> > > gic_handle_irq
> > >  |- irq_exit
> > >     |- preempt_count_sub(HARDIRQ_OFFSET); <-- [1]
> > >     |-__do_softirq 
> > > 	<irqs enabled>
> > > 	|- gic_handle_irq()
> > > 	   |- irq_exit()
> > > 		|- tick_irq_exit()
> > > 		   if (!in_irq()) <-- My understanding is that this condition will be always true due to [1]
> 
> Correct, but that's not the problem. The issue is that this happens in a
> softirq disabled region. Does the below fix it?
> 
> Thanks,
> 
> 	tglx
> 
> 8<--------------------
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 5b33e2f5c0ed..6aab9d54a331 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -888,7 +888,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
>  	if (unlikely(local_softirq_pending() && cpu_online(cpu))) {
>  		static int ratelimit;
>  
> -		if (ratelimit < 10 &&
> +		if (ratelimit < 10 && !in_softirq() &&
>  		    (local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) {
>  			pr_warn("NOHZ: local_softirq_pending %02x\n",
>  				(unsigned int) local_softirq_pending());
> 
> 

Good catch! In 4.14 Rafael hadn't yet changed the path where we stop the idle tick.
We were still stopping it from irq exit and so we could do that while interrupting a
softirq. So we may need to backport this along with "nohz: Fix missing tick reprogram
when interrupting an inline softirq".

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ