linux-kernel - Re: [PATCH] clocksource, prevent overflow in clocksource

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.2.02.1204191451060.2542@ionos>
Date:	Thu, 19 Apr 2012 14:51:29 +0200 (CEST)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Prarit Bhargava <prarit@...hat.com>
cc:	John Stultz <john.stultz@...aro.org>, linux-kernel@...r.kernel.org,
	Salman Qazi <sqazi@...gle.com>, stable@...nel.org
Subject: Re: [PATCH] clocksource, prevent overflow in clocksource_cyc2ns

On Thu, 19 Apr 2012, Thomas Gleixner wrote:

> On Wed, 18 Apr 2012, Prarit Bhargava wrote:
> > There's also some additional information that I've been gathering on this issue;
> > I have seen *idle* systems switch to the hpet because the clocksource watchdog
> > hits the overflow comparison.  As expected it happens much less frequently on
> > newer kernels (linux.git top of tree) than older stable kernels (2.6.32 based)
> > due to the difference in shift values but it is happening in both cases.
> > 
> > The odd thing about this behaviour is that I would expect it to occur with the
> > same frequency on small systems as it does on large systems with linux.git as
> > the watchdog fires once/second.  AFAICT I do not see this on small systems but
> > see it only on systems with greater than 24 cpus (both Intel and AMD).
> >
> > Using debug code similar to the dump code I previously provided, I can see that
> > every so often these large systems can hit a case where the tsc wraps and the
> > hpet is still monotonically increasing.  When the unstable calculation is
> > performed the result is obviously affected by the overflow.  Sometimes this
> > comparison overflow happens within 18 minutes, other times it can take hours or
> > days.
> 
> You are describing symptoms, but the root cause is obviously that the
> watchdog does not get invoked in time. The question is why.
> 
> Can you please add the patch below and enable scheduler, timer and irq
> events in the tracer. Tracing will stop once the watchdog triggers.
> 
> Please provide the traces. We need to understand the root cause of
> this idle wreckage.
> 
> Thanks,
> 
> 	tglx

-ENOPATCH :) 

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index c958338..2214323 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -287,11 +287,15 @@ static void clocksource_watchdog(unsigned long data)
 		cs->cs_last = csnow;
 		cs->wd_last = wdnow;
 
+		trace_printk("wd %lld %lld cs %lld %lld\n" , wdnow, wd_nsec,
+			     csnow, cs_nsec);
+
 		if (atomic_read(&watchdog_reset_pending))
 			continue;
 
 		/* Check the deviation from the watchdog clocksource. */
 		if ((abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD)) {
+			tracing_off();
 			clocksource_unstable(cs, cs_nsec - wd_nsec);
 			continue;
 		}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/