[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180612165635.GT12198@hirez.programming.kicks-ass.net>
Date: Tue, 12 Jun 2018 18:56:35 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: David Arcari <darcari@...hat.com>
Cc: linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>, x86@...nel.org,
Andi Kleen <ak@...ux.intel.com>,
Kan Liang <kan.liang@...ux.intel.com>,
Jiri Olsa <jolsa@...hat.com>,
Donald Zickus <dzickus@...hat.com>,
Prarit Bhargava <prarit@...hat.com>,
Jerry Hoemann <jerry.hoemann@....com>
Subject: Re: [PATCH] perf/x86: read the FREEZE_WHILE_SMM bit during boot
On Mon, Jun 04, 2018 at 10:12:20AM -0400, David Arcari wrote:
> default_do_nmi() will process both perf events (local interrupts) as well as
> external interrupts (such as the NMI button). The handler is coded such that
> if a local interrupt occurs, no check is made for an external interrupt.
> Therefore, if the two interrupts occur simultaneously, the external interrupt
> is lost.
ACK, NMI handling on x86 is less than ideal.
> The code above, which was ultimately discounted, attempts to avoid this
> scenario with as little performance impact as possible by reading the register
> without the spinlock for cpu 0 only (currently only cpu 0 can handle an
> external NMI, I verified this on my system by testing the NMI button with
> cpu 0 offline).
>
> The code above is problematic for a number of reasons not the least of which
> is performance. Furthermore, I don't see a less intrusive solution wrt
> do_default_nmi().
Right, because reading the register itself is dog slow IIRC.
> Upstream 6089327f5424 ("perf/x86: Add sysfs entry to freeze counters on SMI")
> appears to have made it relatively easy to hit this race condition. On some
> systems, this commit has resulted in a change to the default firmware setting
> of DEBUGCTLMSR_FREEZE_IN_SMM_BIT (it is now cleared by the OS by default).
>
> With this bit cleared, the following situation occurs:
>
> 1) external NMI - due to io check
> 2) long duration SMI (counters do not freeze)
> 3) NMI handler runs and misattributes interrupt to perf event
I think 3 is wrong, because 2 will in fact have triggered a PMI (due to
long running) so 3 will observe a PMI and claim the NMI. No
misattribution what so ever.
Because if this wasn't the case, flipping FREEZE_IN_SMM wouldn't have
made a difference.
> Ultimately, my solution was to restore the previous behavior by reading and
> storing the firmware setting of the bit rather than to always clear it.
Ah, urgh.. what a mess. So the OS setting the bit to a known and
consistent value is 'good' IMO. The firmware magically frobbing things
is 'bad'.
Now, explain to me why an IO-check results in an external NMI, and why
there are long running SMI handlers around? Why can't the IO error not
be propagated through the regular device interrupt/state? Why are long
running SMIs required at all, ever? Why doesn't the OS handler whatever
it is the SMM does?
Are you not solving the wrong problem here?
Powered by blists - more mailing lists