linux-kernel - Re: [V2 PATCH 0/6] x86, NMI: give NMI handler a face-lift

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4CDD6CAD.30303@windriver.com>
Date:	Fri, 12 Nov 2010 10:34:53 -0600
From:	Jason Wessel <jason.wessel@...driver.com>
To:	Don Zickus <dzickus@...hat.com>
CC:	Ingo Molnar <mingo@...e.hu>, Peter Zijlstra <peterz@...radead.org>,
	Robert Richter <robert.richter@....com>, ying.huang@...el.com,
	Andi Kleen <andi@...stfloor.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Frederic Weisbecker <fweisbec@...il.com>
Subject: Re: [V2 PATCH 0/6] x86, NMI: give NMI handler a face-lift

On 11/12/2010 10:11 AM, Don Zickus wrote:
> On Fri, Nov 12, 2010 at 09:55:53AM -0600, Jason Wessel wrote:
>   
>>> To answer your question, I doubt this patch series will change that
>>> outcome if it is still broken.
>>>
>>>   
>>>       
>> It was most definitely broken in 2.6.36->2.6.37-rc1.  Randy Dunlap had
>> pointed this out in a separate exchange that was not on LKML.
>>     
>
> Can you clarify by what you mean by broken above?  Was 2.6.36 good or bad?
>
>   

It was absolutely broken in 2.6.36 which I believe is where the new
LOCKUP_DETECTOR changes were introduced.

I tested 2.6.35 and it does not hard hang, but suffered from a different
problem with a perf API change.   The kgdb tests appear to loop and loop
emitting endless streams of output in 2.6.35 and I already have that
problem patched.

At this point we have to get back to a working base line.  At this point
if you use 2.6.37-rc1 the last remaining problem is the perf + lockup
detector callback eating the injected DIE_NMI event which is meant to
enter the debugger.


>> The symptom you would see looks like:
>>
>> ...kernel boot...
>> Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
>> serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
>> 00:06: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
>> brd: module loaded
>> kgdb: Registered I/O driver kgdbts.
>> kgdbts:RUN plant and detach test
>> [...HARD HANG STARTS HERE...]
>>
>> The kernel is looping at that point waiting for the master kgdb cpu to
>> have all the slaves join the debugger but it never happens because the
>> perf callback chain which is used by the lockup detector eats the NMI
>> IPI event.  After the perf callback is processed perf returns
>> NOTIFY_STOP so the notifier which brings the slave CPU into the debugger
>> never fires.
>>     
>
> Ok.  We have code to handle extra spurious NMIs that is hard to accurately
> determine if the NMI was for perf or someone else.  This logic may still
> need tweaking.  What cpu are you running on?  AMD/Intel?  If Intel, then
> core/core2/nehalem?
>
>   

In this case I just built a 32 bit kernel and ran it under kvm on a 64
bit host.  I can send you the .config separately.

kvm  -nographic -k en-us -kernel arch/x86/boot/bzImage -net user -net
nic,macaddr=52:54:00:12:34:56,model=i82557b -append
"console=ttyS0,115200 ip=dhcp root=/dev/nfs
nfsroot=10.0.2.2:/space/exp/x86 rw acpi=force UMA=1" -smp 2


Thanks,
Jason.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/