[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <53B12CB3.5050508@jp.fujitsu.com>
Date: Mon, 30 Jun 2014 18:24:03 +0900
From: "HATAYAMA, Daisuke" <d.hatayama@...fujitsu.com>
To: hpa@...or.com, ak@...ux.intel.com
CC: Don Zickus <dzickus@...hat.com>, matt@...sole-pimps.org,
peterz@...radead.org, acme@...nel.org, mingo@...hat.com,
paulus@...ba.org, tglx@...utronix.de, x86@...nel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] perf/x86/intel: ignore CondChgd bit to avoid false
NMI handling
Hello,
(2014/06/17 0:30), Don Zickus wrote:
> On Fri, Jun 13, 2014 at 05:44:37PM +0900, HATAYAMA Daisuke wrote:
>> Currently, a NMI handler for NMI watchdog may falsely handle any NMI
>> signaled for different purpose if CondChgd bit in
>> MSR_CORE_PERF_GLOBAL_STATUS MSR is set.
>>
>> This commit deals with the issue simply by ignoring CondChgd bit.
>>
>> Here is explanation in detail.
>>
>> On x86 NMI watchdog uses performance monitoring feature to
>> periodically signal NMI each time performance counter gets overflowed.
>>
>> intel_pmu_handle_irq() is called as a NMI_LOCAL handler from a NMI
>> handler of NMI watchdog, perf_event_nmi_handler(). It identifies an
>> owner of a given NMI by looking at overflow status bits in
>> MSR_CORE_PERF_GLOBAL_STATUS MSR. If some of the bits are set, then it
>> handles the given NMI as its own NMI.
>>
>> The problem is that the intel_pmu_handle_irq() doesn't distinguish
>> CondChgd bit from other bits. Unlike the other status bits, CondChgd
>> bit doesn't represent overflow status for performance counters. Thus,
>> CondChgd bit cannot be thought of as a mark indicating a given NMI is
>> NMI watchdog's. As a result, if CondChgd bit is set, any NMI is
>> falsely handled by the NMI handler of NMI watchdog. Also, if type of
>> the falsely handled NMI is either NMI_UNKNOWN, NMI_SERR or
>> NMI_IO_CHECK, the corresponding action is never performed until
>> CondChgd bit is cleared.
>>
>> I noticed this behavior on systems with Ivy Bridge processors: Intel
>> Xeon CPU E5-2630 v2 and Intel Xeon CPU E7-8890 v2. On both systems,
>> CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR has already been set
>> in the beginning at boot. Then the CondChgd bit is immediately cleared
>> by next wrmsr to MSR_CORE_PERF_GLOBAL_CTRL MSR and appears to remain
>> 0.
>>
>> On the other hand, on older processors such as Nehalem, Xeon E7540,
>> CondChgd bit is not set in the beginning at boot.
>>
>> I'm not sure about exact behavior of CondChgd bit, in particular when
>> this bit is set. Although I read Intel System Programmer's Manual to
>> figure out that, the descriptions I found are:
>>
>> In 18.9.1:
>>
>> "The MSR_PERF_GLOBAL_STATUS MSR also provides a ‘sticky bit’ to
>> indicate changes to the state of performancmonitoring hardware"
>>
>> In Table 35-2 IA-32 Architectural MSRs
>>
>> 63 CondChg: status bits of this register has changed.
>>
>> These are different from the bahviour I see on the actual system as I
>> explained above.
>>
>> At least, I think ignoring CondChgd bit should be enough for NMI
>> watchdog perspective.
>
> As I said in a previous email, I ran into a similar problem and was going
> to solve it by zeroing out all the registers on init (which probably would
> have upset Peter :-) ). This is a smaller solution and seems ok. The
> only downside is it is called in the nmi handler.
>
>
> I am working with our customer to try and talk with Intel why this bit is
> set to begin with. Our customer says their BIOS doesn't use the PMU
> during boot so it wasn't clear why this is now set on IVBs (though I don't
> see them on Intel whitebox IVBs).
>
I'm also interested in the behaviour of CondChgd bit on Ivy Bridge processors.
Do you know something about this behaviour?
--
Thanks.
HATAYAMA, Daisuke
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists