[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a8202fa8-a368-947f-994f-060ce83d188b@redhat.com>
Date: Mon, 11 Jun 2018 13:57:22 -0400
From: David Arcari <darcari@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>, x86@...nel.org,
Andi Kleen <ak@...ux.intel.com>,
Kan Liang <kan.liang@...ux.intel.com>,
Jiri Olsa <jolsa@...hat.com>,
Donald Zickus <dzickus@...hat.com>,
Prarit Bhargava <prarit@...hat.com>,
Jerry Hoemann <jerry.hoemann@....com>
Subject: Re: [PATCH] perf/x86: read the FREEZE_WHILE_SMM bit during boot
On 06/04/2018 10:12 AM, David Arcari wrote:
> On 06/04/2018 04:24 AM, Peter Zijlstra wrote:
>> On Sun, Jun 03, 2018 at 02:23:43PM -0400, David Arcari wrote:
>>> On some systems pressing the external NMI button is now failing to inject
>>> an NMI 5-10% of the time. This causes confusion for a user that expects
>>> the NMI to dump the system.
>>>
>>> Commit 6089327f5424 ("perf/x86: Add sysfs entry to freeze counters on SMI")
>>> does not read the firmware setting of the FREEZE_WHILE_SMM bit and will
>>> always clear it when the PMU is initialized. As a result the performance
>>> counters will always run and that greatly expands the race in which
>>> external NMI will not be processed if a local NMI is already being
>>> processed.
>>>
>>> One option is to change default_do_nmi(). The code snippet below shows the
>>> relevant portion of a patch that resolves the issue, but it is problematic
>>> from a performance perspective and was dismissed.
>>>
>>> -345,7 +345,17 @@ static void default_do_nmi(struct pt_regs *regs)
>>> */
>>> if (handled > 1)
>>> __this_cpu_write(swallow_nmi, true);
>>> - return;
>>> +
>>> + /*
>>> + * Unfortunately, there is a race condition which can
>>> + * result in a missing an external NMI. Typically, an
>>> + * external NMI is processed on cpu 0. Therefore, on
>>> + * cpu 0 check for an external NMI before returning.
>>> + */
>>> + if (smp_processor_id() ||
>>> + (x86_platform.get_nmi_reason() & NMI_REASON_MASK) == 0) {
>>> + return;
>>> + }
>>> }
>>>
>>> Ultimately, the issue can be resolved by storing the default firmware
>>> setting of FREEZE_WHILE_SMM before initializing the PMU.
>>
>> I'm sorry, I know it's Monday morning, but what?! I really don't
>> understand anything you write there.
>>
>> Maybe if you explain the race and how your proposed fix closes it things
>> will make sense. The above refers to too many things not here.
>>
>
> Sorry.
>
> default_do_nmi() will process both perf events (local interrupts) as well as
> external interrupts (such as the NMI button). The handler is coded such that
> if a local interrupt occurs, no check is made for an external interrupt.
> Therefore, if the two interrupts occur simultaneously, the external interrupt
> is lost.
>
> The code above, which was ultimately discounted, attempts to avoid this
> scenario with as little performance impact as possible by reading the register
> without the spinlock for cpu 0 only (currently only cpu 0 can handle an
> external NMI, I verified this on my system by testing the NMI button with
> cpu 0 offline).
>
> The code above is problematic for a number of reasons not the least of which
> is performance. Furthermore, I don't see a less intrusive solution wrt
> do_default_nmi().
>
> Upstream 6089327f5424 ("perf/x86: Add sysfs entry to freeze counters on SMI")
> appears to have made it relatively easy to hit this race condition. On some
> systems, this commit has resulted in a change to the default firmware setting
> of DEBUGCTLMSR_FREEZE_IN_SMM_BIT (it is now cleared by the OS by default).
>
> With this bit cleared, the following situation occurs:
>
> 1) external NMI - due to io check
> 2) long duration SMI (counters do not freeze)
> 3) NMI handler runs and misattributes interrupt to perf event
>
> Ultimately, my solution was to restore the previous behavior by reading and
> storing the firmware setting of the bit rather than to always clear it.
>
Hi Peter,
Have you had a chance to take a look at this?
Thanks,
-Dave
Powered by blists - more mailing lists