linux-kernel - Re: perf related boot hang.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140806162308.GD14261@redhat.com>
Date:	Wed, 6 Aug 2014 12:23:08 -0400
From:	Dave Jones <davej@...hat.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Linux Kernel <linux-kernel@...r.kernel.org>
Subject: Re: perf related boot hang.

On Wed, Aug 06, 2014 at 06:19:34PM +0200, Peter Zijlstra wrote:
 > On Wed, Aug 06, 2014 at 10:36:21AM -0400, Dave Jones wrote:
 > > On Linus current tree, when I cold-boot one of my boxes, it locks up
 > > during boot up with this trace..
 > > 
 > > Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 2
 > > CPU: 2 PID: 577 Comm: in:imjournal Not tainted 3.16.0+ #33
 > >  ffff880244c06c88 000000008b73013e ffff880244c06bf0 ffffffffb47ee207
 > >  ffffffffb4c51118 ffff880244c06c78 ffffffffb47ebcf8 0000000000000010
 > >  ffff880244c06c88 ffff880244c06c20 000000008b73013e 0000000000000000
 > > Call Trace:
 > >  <NMI>  [<ffffffffb47ee207>] dump_stack+0x4e/0x7a
 > >  [<ffffffffb47ebcf8>] panic+0xd4/0x207
 > >  [<ffffffffb4145448>] watchdog_overflow_callback+0x118/0x120
 > >  [<ffffffffb4186f0e>] __perf_event_overflow+0xae/0x350
 > >  [<ffffffffb4185380>] ? perf_event_task_disable+0xa0/0xa0
 > >  [<ffffffffb401a4ef>] ? x86_perf_event_set_period+0xbf/0x150
 > >  [<ffffffffb4187d34>] perf_event_overflow+0x14/0x20
 > >  [<ffffffffb40203a6>] intel_pmu_handle_irq+0x206/0x410
 > >  [<ffffffffb401939b>] perf_event_nmi_handler+0x2b/0x50
 > >  [<ffffffffb4007b72>] nmi_handle+0xd2/0x390
 > >  [<ffffffffb4007aa5>] ? nmi_handle+0x5/0x390
 > >  [<ffffffffb40d8301>] ? lock_acquired+0x131/0x450
 > >  [<ffffffffb4008062>] default_do_nmi+0x72/0x1c0
 > > 
 > > 
 > > If I reset it, it then seems to always boot up fine.
 > 
 > Uhm,. cute! And that's the entire stacktrace? It would seem to me there
 > would be at least a 'task' context below that. CPUs simply do not _only_
 > run NMI code, and that trace starts at default_do_nmi().
 
There may have been more to follow, but the machine had locked up solid,
so I couldn't get any more output.  Next time I see it, I'll go check
the console to see if there's anything extra.

Curiously, I just hit another NMI related bug (see other mail) while fuzzing.

	Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/