lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 3 Dec 2014 11:25:29 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Dave Jones <davej@...hat.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Chris Mason <clm@...com>,
	Mike Galbraith <umgwanakikbuti@...il.com>,
	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Dâniel Fraga <fragabr@...il.com>,
	Sasha Levin <sasha.levin@...cle.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	John Stultz <john.stultz@...aro.org>
Subject: Re: frequent lockups in 3.18rc4

On Wed, Dec 3, 2014 at 11:00 AM, Dave Jones <davej@...hat.com> wrote:
>
> So right after sending my last mail, I rebooted, and restarted the run
> on the same kernel again.
>
> As I was writing this mail, this happened.
>
> [  524.420897] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trinity-c178:20182]
>
> and that's all that made it over the console. I couldn't log in via ssh,
> and thought "ah-ha, so it IS bad".  I walked over to reboot it, and
> found I could actually log in on the console. check out this dmesg..
>
> [  503.683055] Clocksource tsc unstable (delta = -95946009388 ns)
> [  503.692038] Switched to clocksource hpet
> [  524.420897] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trinity-c178:20182]

Interesting. That whole NMI watchdog thing happens pretty much 22s
after the "TSC unstable" message.

Have you ever seen that TSC issue before? The watchdog relies on
comparing get_timestamp() differences, so if the timestamp was
incorrect...

Maybe that whole "clocksource_watchdog()" is bogus. That delta is
about 96 seconds, sounds very odd. I'm not seeing how the TSC could
actually scew up that badly, so I'd almost be more likely to blame the
"watchdog" clock.

I don't know. This piece of code:

        delta = clocksource_delta(wdnow, cs->wd_last, watchdog->mask);

makes no sense to me. Shouldn't it be

        delta = clocksource_delta(wdnow, watchdog->wd_last, watchdog->mask);

Thomas? John?

                  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ