linux-kernel - Re: [V2 PATCH 0/6] x86, NMI: give NMI handler a face-lift

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20101116184325.GB4823@redhat.com>
Date:	Tue, 16 Nov 2010 13:43:25 -0500
From:	Don Zickus <dzickus@...hat.com>
To:	Jason Wessel <jason.wessel@...driver.com>
Cc:	Ingo Molnar <mingo@...e.hu>, Peter Zijlstra <peterz@...radead.org>,
	Robert Richter <robert.richter@....com>, ying.huang@...el.com,
	Andi Kleen <andi@...stfloor.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Frederic Weisbecker <fweisbec@...il.com>
Subject: Re: [V2 PATCH 0/6] x86, NMI: give NMI handler a face-lift

On Fri, Nov 12, 2010 at 12:27:55PM -0500, Don Zickus wrote:

Hi Jason,

> 
> > 
> > I tested 2.6.35 and it does not hard hang, but suffered from a different
> > problem with a perf API change.   The kgdb tests appear to loop and loop
> > emitting endless streams of output in 2.6.35 and I already have that
> > problem patched.

I keep getting the following stack trace which is different than your
hang.  Is this looping I am seeing something with the NMI or kgdb?

Cheers,
Don

> 
> It doesn't look like this does it?  This is the streaming output I see
> when try to reproduce this using the config suggestions you gave me.
> 
> [    7.778578] ------------[ cut here ]------------
> [    7.778580] WARNING: at
> /ssd/dzickus/git/upstream/drivers/misc/kgdbts.c:702 run_simple_test+0x18d/0x2f0()
> [    7.778582] Hardware name: To be filled by O.E.M.
> [    7.778583] Modules linked in: ata_generic i915 drm_kms_helper drm i2c_algo_bit i2c_core video output dm_mod
> [    7.778589] Pid: 150, comm: udevd Tainted: G        W   2.6.36-killnmi+ #12
> [    7.778590] Call Trace:
> [    7.778591]  <#DB>  [<ffffffff810631cf>] warn_slowpath_common+0x7f/0xc0
> [    7.778595]  [<ffffffff8106322a>] warn_slowpath_null+0x1a/0x20
> [    7.778598]  [<ffffffff8132941d>] run_simple_test+0x18d/0x2f0
> [    7.778600]  [<ffffffff81328ded>] kgdbts_put_char+0x1d/0x20
> [    7.778603]  [<ffffffff810c6cbd>] put_packet+0x5d/0x120
> [    7.778605]  [<ffffffff810c7f44>] gdb_serial_stub+0xa24/0xc20
> [    7.778609]  [<ffffffff810c6558>] kgdb_cpu_enter+0x2c8/0x590
> [    7.778612]  [<ffffffff810c6a91>] kgdb_handle_exception+0x121/0x170
> [    7.778615]  [<ffffffff814cd7b8>] ?  hw_breakpoint_exceptions_notify+0xe8/0x1d0
> [    7.778617]  [<ffffffff81033472>] __kgdb_notify+0x82/0x1b0
> [    7.778620]  [<ffffffff810335c7>] kgdb_notify+0x27/0x40
> [    7.778623]  [<ffffffff814cf8e5>] notifier_call_chain+0x55/0x80
> [    7.778625]  [<ffffffff814cf958>] __atomic_notifier_call_chain+0x48/0x70
> [    7.778628]  [<ffffffff814cf996>] atomic_notifier_call_chain+0x16/0x20
> [    7.778631]  [<ffffffff814cf9ce>] notify_die+0x2e/0x30
> [    7.778633]  [<ffffffff814cc953>] do_debug+0xa3/0x170
> [    7.778636]  [<ffffffff814cc438>] debug+0x28/0x40
> [    7.778639]  [<ffffffff81062310>] ? do_fork+0x0/0x450
> [    7.778640]  <<EOE>>  [<ffffffff81014938>] ? sys_clone+0x28/0x30
> [    7.778644]  [<ffffffff8100c4d3>] stub_clone+0x13/0x20
> [    7.778647]  [<ffffffff8100c1b2>] ? system_call_fastpath+0x16/0x1b
> [    7.778649] ---[ end trace ecf07e0cd1846c34 ]---
> [    7.778650] kgdbts: ERROR: beyond end of test on 'do_fork_test' line 11
> [    7.778651] ------------[ cut here ]------------
> 
> > 
> > At this point we have to get back to a working base line.  At this point
> > if you use 2.6.37-rc1 the last remaining problem is the perf + lockup
> > detector callback eating the injected DIE_NMI event which is meant to
> > enter the debugger.
> 
> This shouldn't be too hard to solve once we figure out which path it takes
> in the perf nmi handler.
> 
> Cheers,
> Don
> 
> > 
> > 
> > >> The symptom you would see looks like:
> > >>
> > >> ...kernel boot...
> > >> Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
> > >> serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
> > >> 00:06: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
> > >> brd: module loaded
> > >> kgdb: Registered I/O driver kgdbts.
> > >> kgdbts:RUN plant and detach test
> > >> [...HARD HANG STARTS HERE...]
> > >>
> > >> The kernel is looping at that point waiting for the master kgdb cpu to
> > >> have all the slaves join the debugger but it never happens because the
> > >> perf callback chain which is used by the lockup detector eats the NMI
> > >> IPI event.  After the perf callback is processed perf returns
> > >> NOTIFY_STOP so the notifier which brings the slave CPU into the debugger
> > >> never fires.
> > >>     
> > >
> > > Ok.  We have code to handle extra spurious NMIs that is hard to accurately
> > > determine if the NMI was for perf or someone else.  This logic may still
> > > need tweaking.  What cpu are you running on?  AMD/Intel?  If Intel, then
> > > core/core2/nehalem?
> > >
> > >   
> > 
> > In this case I just built a 32 bit kernel and ran it under kvm on a 64
> > bit host.  I can send you the .config separately.
> > 
> > kvm  -nographic -k en-us -kernel arch/x86/boot/bzImage -net user -net
> > nic,macaddr=52:54:00:12:34:56,model=i82557b -append
> > "console=ttyS0,115200 ip=dhcp root=/dev/nfs
> > nfsroot=10.0.2.2:/space/exp/x86 rw acpi=force UMA=1" -smp 2
> 
> Does that you hit the problem on the kvm guest or host?  I wasn't aware
> the perf worked inside the guest (well at least the hardware pieces of
> it, like NMI).
> 
> Cheers,
> Don
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/