linux-kernel - Re: frequent lockups in 3.18rc4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 03 Jan 2015 09:58:30 -0500
From:	Sasha Levin <sasha.levin@...cle.com>
To:	John Stultz <john.stultz@...aro.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>
CC:	Dave Jones <davej@...emonkey.org.uk>,
	Thomas Gleixner <tglx@...utronix.de>, Chris Mason <clm@...com>,
	Mike Galbraith <umgwanakikbuti@...il.com>,
	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Dâniel Fraga <fragabr@...il.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Suresh Siddha <sbsiddha@...il.com>,
	Oleg Nesterov <oleg@...hat.com>,
	Peter Anvin <hpa@...ux.intel.com>
Subject: Re: frequent lockups in 3.18rc4

On 01/02/2015 07:27 PM, John Stultz wrote:
> On Fri, Dec 26, 2014 at 12:57 PM, Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
>> > On Fri, Dec 26, 2014 at 10:12 AM, Dave Jones <davej@...emonkey.org.uk> wrote:
>>> >> On Fri, Dec 26, 2014 at 11:34:10AM -0500, Dave Jones wrote:
>>> >>
>>> >>  > One thing I think I'll try is to try and narrow down which
>>> >>  > syscalls are triggering those "Clocksource hpet had cycles off"
>>> >>  > messages.  I'm still unclear on exactly what is doing
>>> >>  > the stomping on the hpet.
>>> >>
>>> >> First I ran trinity with "-g vm" which limits it to use just
>>> >> a subset of syscalls, specifically VM related ones.
>>> >> That triggered the messages. Further experiments revealed:
>> >
>> > So I can trigger the false positives with my original patch quite
>> > easily by just putting my box under some load. My numbers are nowhere
>> > near as bad as yours, but then, I didn't put it under as much load
>> > anyway. Just a regular "make -j64" of the kernel.
>> >
>> > I suspect your false positives are bigger partly because of the load,
>> > but mostly because you presumably have preemption enabled too. I don't
>> > do preemption in my normal kernels, and that limits the damage of the
>> > race a bit.
>> >
>> > I have a newer version of the patch that gets rid of the false
>> > positives with some ordering rules instead, and just for you I hacked
>> > it up to say where the problem happens too, but it's likely too late.
>> >
>> > The fact that the original racy patch seems to make a difference for
>> > you does say that yes, we seem to be zeroing in on the right area
>> > here, but I'm not seeing what's wrong. I was hoping for big jumps from
>> > your HPET, since your "TSC unstable" messages do kind of imply that
>> > such really big jumps can happen.
>> >
>> > I'm attaching my updated hacky patch, although I assume it's much too
>> > late for that machine. Don't look too closely at the backtrace
>> > generation part, that's just a quick hack, and only works with frame
>> > pointers enabled anyway.
>> >
>> > So I'm still a bit unhappy about not figuring out *what* is wrong. And
>> > I'd still like the dmidecode from that machine, just for posterity. In
>> > case we can figure out some pattern.
>> >
>> > So right now I can imagine several reasons:
>> >
>> >  - actual hardware bug.
>> >
>> >    This is *really* unlikely, though. It should hit everybody. The
>> > HPET is in the core intel chipset, we're not talking random unusual
>> > hardware by fly-by-night vendors here.
>> >
>> >  - some SMM/BIOS "power management" feature.
>> >
>> >    We've seen this before, where the SMM saves/restores the TSC on
>> > entry/exit in order to hide itself from the system. I could imagine
>> > similar code for the HPET counter. SMM writers use some bad drugs to
>> > dull their pain.
>> >
>> >    And with the HPET counter, since it's not even per-CPU, the "save
>> > and restore HPET" will actually show up as "HPET went backwards" to
>> > the other non-SMM CPU's if it happens
>> >
>> >  - a bug in our own clocksource handling.
>> >
>> >    I'm not seeing it. But maybe my patch hides it for some magical reason.
> So I sent out a first step validation check to warn us if we end up
> with idle periods that are larger then we expect.
> 
> It doesn't yet cap the timekeeping_get_ns() output (like you're patch
> effectively does), but it would be easy to do that in a following
> patch.
> 
> I did notice while testing this that the max_idle_ns (max idle time we
> report to the scheduler) for the hpet is only ~16sec, and we'll
> overflow after just ~21seconds. This second number maps closely to the
> 22 second stalls seen in the  nmi watchdog reports which seems
> interesting, but I also realize that qemu uses a 100MHz hpet, where as
> real hardware is likely to be a bit slower, so maybe that's just
> chance..
> 
> I'd be interested if folks seeing anything similar to Dave would give
> my patch a shot.

I ran it overnight, but I didn't see any of the new warnings in the logs.


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/