lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1279854789.2442.54.camel@localhost.localdomain>
Date:	Thu, 22 Jul 2010 20:13:09 -0700
From:	john stultz <johnstul@...ibm.com>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	lkml <linux-kernel@...r.kernel.org>,
	Darren Hart <dvhltc@...ibm.com>,
	Paul Clarke <pacman@...ibm.com>
Subject: [RFC][PATCH -rt] Handling delayed clocksource watchdog timer

Hey Thomas,
	I just wanted to run this by you to see if you maybe had a thought on
how to better handle a situation we came across.

Now, first of all, with the rt throttling enabled, this is less likely
to bite folks, but its possible that folks disabling the throttling
could run into trouble.

We ran into a case where an very heavy -rt test load, which was normally
run on a multi-processor system was moved over to a single core system.
The box basically locks up for awhile while the test load churns away
(lots of SCHED_FIFO tasks, basically hogging the cpu).

After some hours, when the test completes, we noticed that even on
systems with stable TSCs, we were seeing the: "Clocksource tsc unstable
(delta = XYZ ns)" messages, and the system had fallen back to a slower
clocksource.

Digging into the clocksource watchdog code, I realized that the watchdog
timer was being drastically delayed. So long in fact, that the
clocksource watchdog hardware would wrap (while the TSC would not). This
then caused a great disparity between the clocksource interval and the
watchdog interval, and thus the TSC was being marked unstable.

The solution is a little more difficult to figure out. Because with the
delay we run into the problem that the watchdog hardware is not
reliable. So it makes it difficult distinguish between cases where the
clocksource is actually bad, and the cases where the timer was delayed.

The following patch tries to ignore cases where the timer was delayed by
more then 4x the watchdog interval. However, its not really perfect,
since jiffies may be driven by the clocksource, so if the clocksource
running really really fast, its possible that jiffies would increment
quickly enough that we'd think we just were delayed a long time.

I also considered skipping cases where the wd_nsec value was too large
or too small, but again, as the timer is driven by the possibly bad
clocksource, it seemed we could give false negatives and miss the bad
hardware.

Any better ideas?

thanks
-john

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 0e98497..a91e7ba 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -280,7 +280,8 @@ static void clocksource_watchdog(unsigned long data)
 		cs_nsec = clocksource_cyc2ns((csnow - cs->wd_last) &
 					     cs->mask, cs->mult, cs->shift);
 		cs->wd_last = csnow;
-		if (abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD) {
+		if (jiffies - watchdog_timer.expires > 4*WATCHDOG_INTERVAL) &&
+				(abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD) {
 			clocksource_unstable(cs, cs_nsec - wd_nsec);
 			continue;
 		}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ