lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 19 Apr 2012 14:37:41 +0200 (CEST)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Prarit Bhargava <prarit@...hat.com>
cc:	John Stultz <john.stultz@...aro.org>, linux-kernel@...r.kernel.org,
	Salman Qazi <sqazi@...gle.com>, stable@...nel.org
Subject: Re: [PATCH] clocksource, prevent overflow in clocksource_cyc2ns

On Wed, 18 Apr 2012, Prarit Bhargava wrote:
> There's also some additional information that I've been gathering on this issue;
> I have seen *idle* systems switch to the hpet because the clocksource watchdog
> hits the overflow comparison.  As expected it happens much less frequently on
> newer kernels (linux.git top of tree) than older stable kernels (2.6.32 based)
> due to the difference in shift values but it is happening in both cases.
> 
> The odd thing about this behaviour is that I would expect it to occur with the
> same frequency on small systems as it does on large systems with linux.git as
> the watchdog fires once/second.  AFAICT I do not see this on small systems but
> see it only on systems with greater than 24 cpus (both Intel and AMD).
>
> Using debug code similar to the dump code I previously provided, I can see that
> every so often these large systems can hit a case where the tsc wraps and the
> hpet is still monotonically increasing.  When the unstable calculation is
> performed the result is obviously affected by the overflow.  Sometimes this
> comparison overflow happens within 18 minutes, other times it can take hours or
> days.

You are describing symptoms, but the root cause is obviously that the
watchdog does not get invoked in time. The question is why.

Can you please add the patch below and enable scheduler, timer and irq
events in the tracer. Tracing will stop once the watchdog triggers.

Please provide the traces. We need to understand the root cause of
this idle wreckage.

Thanks,

	tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ