lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 14 Apr 2011 20:45:40 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Don Zickus <dzickus@...hat.com>
Cc:	Cyrill Gorcunov <gorcunov@...nvz.org>,
	Lin Ming <ming.m.lin@...el.com>,
	Shaun Ruffell <sruffell@...ium.com>,
	Maciej Rutecki <maciej.rutecki@...il.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Stephane Eranian <eranian@...gle.com>,
	Robert Richter <robert.richter@....com>,
	lkml <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH -tip] perf, x86: fix unknown NMIs on a Pentium4 box


* Don Zickus <dzickus@...hat.com> wrote:

> On Thu, Apr 14, 2011 at 07:43:27PM +0200, Ingo Molnar wrote:
> > 
> > * Cyrill Gorcunov <gorcunov@...nvz.org> wrote:
> > 
> > > --- linux-2.6.git.orig/arch/x86/kernel/cpu/perf_event.c
> > > +++ linux-2.6.git/arch/x86/kernel/cpu/perf_event.c
> > > @@ -1370,9 +1370,16 @@ perf_event_nmi_handler(struct notifier_b
> > >  		return NOTIFY_DONE;
> > >  	}
> > > 
> > > -	apic_write(APIC_LVTPC, APIC_DM_NMI);
> > > 
> > >  	handled = x86_pmu.handle_irq(args->regs);
> > > +
> > > +	/*
> > > +	 * Note the unmasking of LVTPC entry must be
> > > +	 * done *after* counter oveflow flag is cleared
> > > +	 * otherwise it might lead to double NMIs generation.
> > > +	 */
> > > +	apic_write(APIC_LVTPC, APIC_DM_NMI);
> > > +
> > >  	if (!handled)
> > >  		return NOTIFY_DONE;
> > > 
> > 
> > This breaks 'perf top' on Intel Nehalem and probably other CPUs. The NMI gets 
> > stuck fast on all CPUs:
> > 
> >   NMI: 16 6 3 3 3 3 3 3 3 3 3 3 3 3 4 5 Non-maskable interrupts
> 
> Damn it, I was working on getting there.  First I did P4s, now I was working 
> on acme's core2 issues.  Nehalem was next on my list, I swear! :-)))))
> 
> So this sucks.  I'll grab a Nehalem and see what went wrong.  It's probably 
> because of the other 'this seems to work' hacks I put in that handler.  I bet 
> if I clean those up, this problem will be fixed.
> 
> I will note that using my patch on a core2quad system, lowered the number of 
> back-to-back NMIs I was seeing when running a couple of perf records and a 
> make -j8 (still generates unknown NMIs though :-( ).

Here's the cpuinfo:

processor	: 15
vendor_id	: GenuineIntel
cpu family	: 6
model		: 26
model name	: Intel(R) Xeon(R) CPU           X55600 @ 2.80GHz
stepping	: 5
cpu MHz		: 2794.000
cache size	: 8192 KB
physical id	: 1
siblings	: 8
core id		: 3
cpu cores	: 4
apicid		: 23
initial apicid	: 23
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida dts tpr_shadow vnmi flexpriority ept vpid
bogomips	: 5599.19
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:

just in case you have trouble reproducing the problem.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ