[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100825110006.GB27891@elte.hu>
Date: Wed, 25 Aug 2010 13:00:06 +0200
From: Ingo Molnar <mingo@...e.hu>
To: Robert Richter <robert.richter@....com>
Cc: Don Zickus <dzickus@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Cyrill Gorcunov <gorcunov@...il.com>,
Lin Ming <ming.m.lin@...el.com>,
"fweisbec@...il.com" <fweisbec@...il.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"Huang, Ying" <ying.huang@...el.com>,
Yinghai Lu <yinghai@...nel.org>,
Andi Kleen <andi@...stfloor.org>
Subject: Re: [PATCH -v3] perf, x86: try to handle unknown nmis with running
perfctrs
* Ingo Molnar <mingo@...e.hu> wrote:
> > You might use the debug patch below for diagnostics.
>
> Thanks, will try that and report back.
Here's a more detailed description of the regression introduced by:
4a31beb: perf, x86: Fix handle_irq return values
8e3e42b: perf, x86: Try to handle unknown nmis with an enabled PMU
Booting into the debug kernel the system boots up fine - no NMI
messages, as expected.
Then when i start 'perf top' for the first time i get the NMI message
with this debug output:
cpu #15, nmi #160, marked #0, handled = 1, time = 333392635730, delta = 11238255
cpu #15, nmi #161, marked #0, handled = 1, time = 333403779380, delta = 11143650
cpu #15, nmi #162, marked #0, handled = 1, time = 333415418497, delta = 11639117
cpu #15, nmi #163, marked #0, handled = 1, time = 333415467084, delta = 48587
cpu #15, nmi #164, marked #0, handled = 1, time = 333415501531, delta = 34447
cpu #15, nmi #165, marked #0, handled = 1, time = 333459918106, delta = 44416575
cpu #15, nmi #166, marked #0, handled = 0, time = 333459923167, delta = 1666
cpu #15, nmi #151, marked #0, handled = 1, time = 332978597882, delta = 11447002
cpu #15, nmi #152, marked #0, handled = 1, time = 332978657151, delta = 59269
cpu #15, nmi #153, marked #0, handled = 1, time = 332978667847, delta = 10696
cpu #15, nmi #154, marked #0, handled = 1, time = 333023125757, delta = 44457910
cpu #15, nmi #155, marked #0, handled = 1, time = 333291980833, delta = 268855076
cpu #15, nmi #156, marked #0, handled = 1, time = 333325663125, delta = 33682292
cpu #15, nmi #157, marked #0, handled = 1, time = 333348216481, delta = 22553356
cpu #15, nmi #158, marked #0, handled = 1, time = 333370168887, delta = 21952406
cpu #15, nmi #159, marked #0, handled = 1, time = 333381397475, delta = 11228588
Uhhuh. NMI received for unknown reason 00 on CPU 15.
Do you have a strange power saving mode enabled?
Dazed and confused, but trying to continue
When i start perf top for a second time, no messages are printed at all.
The reason is that on one of the CPUs NMIs are 'stuck':
NMI: 78164 67099 6342 [*] 65677 66119 63796 65395 63995 65012 64151 65082
63483 64948 62926 65608 62630
CPU#2 is stuck at 6342.
The NMIs work fine on other CPUs and perf top works (sans the missing
samples from CPU#2@@), and the NMIs keep ticking.
The CPU is:
processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU X55600 @ 2.80GHz
stepping : 5
cpu MHz : 2794.000
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 1
cpu cores : 4
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid
bogomips : 5599.98
clflush size : 64
cache_alignment: 64
address sizes : 40 bits physical, 48 bits virtual
power management:
The PMU init is:
Performance Events: PEBS fmt1+, Nehalem events, Intel PMU driver.
... version: 3
... bit width: 48
... generic registers: 4
... value mask: 0000ffffffffffff
... max period: 000000007fffffff
... fixed-purpose events: 3
... event mask: 000000070000000f
I've attached the config as well.
Thanks,
Ingo
View attachment "config" of type "text/plain" (83224 bytes)
Powered by blists - more mailing lists