lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 25 Aug 2010 13:00:06 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Robert Richter <robert.richter@....com>
Cc:	Don Zickus <dzickus@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Cyrill Gorcunov <gorcunov@...il.com>,
	Lin Ming <ming.m.lin@...el.com>,
	"fweisbec@...il.com" <fweisbec@...il.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"Huang, Ying" <ying.huang@...el.com>,
	Yinghai Lu <yinghai@...nel.org>,
	Andi Kleen <andi@...stfloor.org>
Subject: Re: [PATCH -v3] perf, x86: try to handle unknown nmis with running
 perfctrs


* Ingo Molnar <mingo@...e.hu> wrote:

> > You might use the debug patch below for diagnostics.
> 
> Thanks, will try that and report back.

Here's a more detailed description of the regression introduced by:

  4a31beb: perf, x86: Fix handle_irq return values
  8e3e42b: perf, x86: Try to handle unknown nmis with an enabled PMU

Booting into the debug kernel the system boots up fine - no NMI 
messages, as expected.

Then when i start 'perf top' for the first time i get the NMI message 
with this debug output:

 cpu #15, nmi #160, marked #0, handled = 1, time = 333392635730, delta = 11238255
 cpu #15, nmi #161, marked #0, handled = 1, time = 333403779380, delta = 11143650
 cpu #15, nmi #162, marked #0, handled = 1, time = 333415418497, delta = 11639117
 cpu #15, nmi #163, marked #0, handled = 1, time = 333415467084, delta = 48587
 cpu #15, nmi #164, marked #0, handled = 1, time = 333415501531, delta = 34447
 cpu #15, nmi #165, marked #0, handled = 1, time = 333459918106, delta = 44416575
 cpu #15, nmi #166, marked #0, handled = 0, time = 333459923167, delta = 1666
 cpu #15, nmi #151, marked #0, handled = 1, time = 332978597882, delta = 11447002
 cpu #15, nmi #152, marked #0, handled = 1, time = 332978657151, delta = 59269
 cpu #15, nmi #153, marked #0, handled = 1, time = 332978667847, delta = 10696
 cpu #15, nmi #154, marked #0, handled = 1, time = 333023125757, delta = 44457910
 cpu #15, nmi #155, marked #0, handled = 1, time = 333291980833, delta = 268855076
 cpu #15, nmi #156, marked #0, handled = 1, time = 333325663125, delta = 33682292
 cpu #15, nmi #157, marked #0, handled = 1, time = 333348216481, delta = 22553356
 cpu #15, nmi #158, marked #0, handled = 1, time = 333370168887, delta = 21952406
 cpu #15, nmi #159, marked #0, handled = 1, time = 333381397475, delta = 11228588
 Uhhuh. NMI received for unknown reason 00 on CPU 15.
 Do you have a strange power saving mode enabled?
 Dazed and confused, but trying to continue

When i start perf top for a second time, no messages are printed at all. 
The reason is that on one of the CPUs NMIs are 'stuck':

 NMI: 78164 67099 6342 [*] 65677 66119 63796 65395 63995 65012 64151 65082 
      63483 64948 62926 65608 62630

CPU#2 is stuck at 6342.

The NMIs work fine on other CPUs and perf top works (sans the missing 
samples from CPU#2@@), and the NMIs keep ticking.

The CPU is:

 processor	: 2
 vendor_id	: GenuineIntel
 cpu family	: 6
 model		: 26
 model name	: Intel(R) Xeon(R) CPU           X55600 @ 2.80GHz
 stepping	: 5
 cpu MHz	: 2794.000
 cache size	: 8192 KB
 physical id	: 0
 siblings	: 8
 core id	: 1
 cpu cores	: 4
 apicid		: 2
 initial apicid	: 2
 fpu		: yes
 fpu_exception	: yes
 cpuid level	: 11
 wp		: yes
 flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid
 bogomips	: 5599.98
 clflush size	: 64
 cache_alignment: 64
 address sizes	: 40 bits physical, 48 bits virtual
 power management:

The PMU init is:

 Performance Events: PEBS fmt1+, Nehalem events, Intel PMU driver.
 ... version:                3
 ... bit width:              48
 ... generic registers:      4
 ... value mask:             0000ffffffffffff
 ... max period:             000000007fffffff
 ... fixed-purpose events:   3
 ... event mask:             000000070000000f

I've attached the config as well.

Thanks,

	Ingo

View attachment "config" of type "text/plain" (83224 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ