lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 21 Dec 2014 16:41:18 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Dave Jones <davej@...emonkey.org.uk>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>, Chris Mason <clm@...com>,
	Mike Galbraith <umgwanakikbuti@...il.com>,
	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Dâniel Fraga <fragabr@...il.com>,
	Sasha Levin <sasha.levin@...cle.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Suresh Siddha <sbsiddha@...il.com>,
	Oleg Nesterov <oleg@...hat.com>,
	Peter Anvin <hpa@...ux.intel.com>
Subject: Re: frequent lockups in 3.18rc4

On Sun, Dec 21, 2014 at 3:58 PM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> I can do the mmap(/dev/mem) thing and access the HPET by hand, and
> when I write zero to it I immediately get something like this:
>
>   Clocksource tsc unstable (delta = -284317725450 ns)
>   Switched to clocksource hpet
>
> just to confirm that yes, a jump in the HPET counter would indeed give
> those kinds of symptoms:blaming the TSC with a negative delta in the
> 0-300s range, even though it's the HPET that is broken.
>
> And if the HPET then occasionally jumps around afterwards, it would
> show up as ktime_get() occasionally going backwards, which in turn
> would - as far as I can tell - result in exactly that pseudo-infirnite
> loop with timers.

Ok, so I tried that too.

It's actually a pretty easy experiment to do: just mmap(/dev/mem) at
the HPET offset (the kernel prints it out at boot, it should normally
be at 0xfed00000). And then just write a zero to offset 0xf0, which is
the main counter.

The first time, you get the "Clocksource tsc unstable".

The second time (or third, or fourth - it might not take immediately)
you get a lockup or similar. Bad things happen.

This is *not* to say that this is the bug you're hitting. But it does show that

 (a) a flaky HPET can do some seriously bad stuff
 (b) the kernel is very fragile wrt time going backwards.

and maybe we can use this test program to at least try to alleviate problem (b).

Trivial HPET mess-up program attached.

                                Linus

View attachment "hpet-mess.c" of type "text/x-csrc" (479 bytes)

Powered by blists - more mailing lists