[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFygv+PRcYScwCjGVQ7-0PA1mHOGYaKGKS4LBhxPm0YBJA@mail.gmail.com>
Date: Sun, 21 Dec 2014 15:58:18 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Dave Jones <davej@...emonkey.org.uk>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Thomas Gleixner <tglx@...utronix.de>, Chris Mason <clm@...com>,
Mike Galbraith <umgwanakikbuti@...il.com>,
Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Dâniel Fraga <fragabr@...il.com>,
Sasha Levin <sasha.levin@...cle.com>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Suresh Siddha <sbsiddha@...il.com>,
Oleg Nesterov <oleg@...hat.com>,
Peter Anvin <hpa@...ux.intel.com>
Subject: Re: frequent lockups in 3.18rc4
On Sun, Dec 21, 2014 at 2:32 PM, Dave Jones <davej@...emonkey.org.uk> wrote:
> On Sun, Dec 21, 2014 at 02:19:03PM -0800, Linus Torvalds wrote:
> >
> > And finally, and stupidly, is there any chance that you have anything
> > accessing /dev/hpet?
>
> Not knowingly at least, but who the hell knows what systemd has its
> fingers in these days.
Actually, it looks like /dev/hpet doesn't allow write access.
I can do the mmap(/dev/mem) thing and access the HPET by hand, and
when I write zero to it I immediately get something like this:
Clocksource tsc unstable (delta = -284317725450 ns)
Switched to clocksource hpet
just to confirm that yes, a jump in the HPET counter would indeed give
those kinds of symptoms:blaming the TSC with a negative delta in the
0-300s range, even though it's the HPET that is broken.
And if the HPET then occasionally jumps around afterwards, it would
show up as ktime_get() occasionally going backwards, which in turn
would - as far as I can tell - result in exactly that pseudo-infirnite
loop with timers.
Anyway, any wild kernel pointer access *could* happen to just hit the
HPET and write to the main counter value, although I'd personally be
more inclined to blame BIOS/SMM kind of code playing tricks with
time.. We do have a few places where we explicitly write the value on
purpose, but they are in the HPET init code, and in the clocksource
resume code, so they should not be involved.
Thomas - have you had reports of HPET breakage in RT circles, the same
way BIOSes have been tinkering with TSC?
Also, would it perhaps be a good idea to make "ktime_get()" save the
last time in a percpu variable, and warn if time ever goes backwards
on a particular CPU? A percpu thing should be pretty cheap, even if
we write to it every time somebody asks for time..
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists