lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190402130302.GL12232@hirez.programming.kicks-ass.net>
Date:   Tue, 2 Apr 2019 15:03:02 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     "Lendacky, Thomas" <Thomas.Lendacky@....com>
Cc:     "x86@...nel.org" <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Namhyung Kim <namhyung@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Jiri Olsa <jolsa@...hat.com>, gorcunov@...il.com,
        Vince Weaver <vince@...ter.net>,
        Stephane Eranian <eranian@...gle.com>
Subject: Re: [RFC PATCH v3 0/3] x86/perf/amd: AMD PMC counters and NMI latency

On Mon, Apr 01, 2019 at 09:46:33PM +0000, Lendacky, Thomas wrote:
> This patch series addresses issues with increased NMI latency in newer
> AMD processors that can result in unknown NMI messages when PMC counters
> are active.
> 
> The following fixes are included in this series:
> 
> - Resolve a race condition when disabling an overflowed PMC counter,
>   specifically when updating the PMC counter with a new value.
> - Resolve handling of active PMC counter overflows in the perf NMI
>   handler and when to report that the NMI is not related to a PMC.
> - Remove earlier workaround for spurious NMIs by re-ordering the
>   PMC stop sequence to disable the PMC first and then remove the PMC
>   bit from the active_mask bitmap. As part of disabling the PMC, the
>   code will wait for an overflow to be reset.
> 
> The last patch re-works the order of when the PMC is removed from the
> active_mask. There was a comment from a long time ago about having
> to clear the bit in active_mask before disabling the counter because
> the perf NMI handler could re-enable the PMC again. Looking at the
> handler today, I don't see that as possible, hence the reordering. The
> question will be whether the Intel PMC support will now have issues.
> There is still support for using x86_pmu_handle_irq() in the Intel
> core.c file. Did Intel have any issues with spurious NMIs in the past?
> Peter Z, any thoughts on this?

I can't remember :/ I suppose we'll see if anything pops up after these
here patches. At least then we get a chance to properly document things.

> Also, I couldn't completely get rid of the "running" bit because it
> is used by arch/x86/events/intel/p4.c. An old commit comment that
> seems to indicate the p4 code suffered the spurious interrupts:
> 03e22198d237 ("perf, x86: Handle in flight NMIs on P4 platform").
> So maybe that partially answers my previous question...

Yeah, the P4 code is magic, and I don't have any such machines left, nor
do I think does Cyrill who wrote much of that.

I have vague memories of the P4 thing crashing with Vince's perf_fuzzer,
but maybe I'm wrong.

Ideally we'd find a willing victim to maintain that thing, or possibly
just delete it, dunno if anybody still cares.


Anyway, I like these patches, but I cannot apply since you send them
base64 encoded and my script chokes on that.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ