lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080423085030.GA22309@elte.hu>
Date:	Wed, 23 Apr 2008 10:50:30 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	David Miller <davem@...emloft.net>
Cc:	linux-kernel@...r.kernel.org, tglx@...utronix.de,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: [patch] softlockup: fix false positives on nohz if CPU is 100%
	idle for more than 60 seconds


* David Miller <davem@...emloft.net> wrote:

> > +++ linux/kernel/time/tick-sched.c
> > @@ -393,6 +393,7 @@ void tick_nohz_restart_sched_tick(void)
> >  		sub_preempt_count(HARDIRQ_OFFSET);
> >  	}
> >  
> > +	touch_softlockup_watchdog();
> >  	/*
> >  	 * Cancel the scheduled timer and restore the tick
> >  	 */
> 
> The NOHZ lockup warnings are gone.  But this seems like a band-aid.  
> We made sure that cpus don't get into this state via commit:

but that commit:

> @@ -133,6 +133,8 @@ void tick_nohz_update_jiffies(void)
>  	if (!ts->tick_stopped)
>  		return;
>  
> +	touch_softlockup_watchdog();
> +
>  	cpu_clear(cpu, nohz_cpu_mask);
>  	now = ktime_get();

updates the softlockup watchdog before we update jiffies:

>       ts->idle_waketime = now;
>
>       local_irq_save(flags);
>       tick_do_update_jiffies64(now);
>       local_irq_restore(flags);

so the patch i gave you first should do the trick - the cleaner patch is 
attached below.

> While what the guilty patch we're discussing here does is change how 
> cpu_clock() is computed, that's it.  softlockup uses cpu_clock() to 
> calculate it's timestamp.  The guilty change modified nothing about 
> when touch_softlockup_watchdog() is called, nor any other aspect about 
> how the softlockup mechanism itself works.
> 
> So we need to figure out why in the world changing how cpu_clock() 
> gets calculated makes a difference.

the thing is that that change _fixed_ cpu_clock() - and exposed the 
latent softlockup/nohz interaction that was always there and which 
caused false positives if a CPU stayed idle for more than 60 seconds.

	Ingo

------------------>
Subject: softlockup: fix false positives on nohz if CPU is 100% idle for more than 60 seconds
From: Ingo Molnar <mingo@...e.hu>
Date: Wed Apr 23 10:01:08 CEST 2008

David Miller reported softlockup false positives on his 128-way Niagara2 
system. The special thing about that system is extremely long clockevent 
timeouts combined with extremly long idle times.

Fix rq->clock update bug that can trigger on such a system: in 
tick_nohz_update_jiffies() [which is called on all irq entry on all cpus 
where the irq entry hits an idle cpu] we call 
touch_softlockup_watchdog() before we update jiffies. That works fine 
most of the time when idle timeouts are within 60 seconds. But when an 
idle timeout is beyond 60 seconds, jiffies is updated with a jump of 
more than 60 seconds, which causes a jump in cpu-clock of more than 60 
seconds, triggering a false positive.

Reported-by: David Miller <davem@...emloft.net>
Signed-off-by: Ingo Molnar <mingo@...e.hu>
---
 kernel/time/tick-sched.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux/kernel/time/tick-sched.c
===================================================================
--- linux.orig/kernel/time/tick-sched.c
+++ linux/kernel/time/tick-sched.c
@@ -133,8 +133,6 @@ void tick_nohz_update_jiffies(void)
 	if (!ts->tick_stopped)
 		return;
 
-	touch_softlockup_watchdog();
-
 	cpu_clear(cpu, nohz_cpu_mask);
 	now = ktime_get();
 	ts->idle_waketime = now;
@@ -142,6 +140,8 @@ void tick_nohz_update_jiffies(void)
 	local_irq_save(flags);
 	tick_do_update_jiffies64(now);
 	local_irq_restore(flags);
+
+	touch_softlockup_watchdog();
 }
 
 void tick_nohz_stop_idle(int cpu)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ