linux-kernel - Re: [RFC PATCH 1/2] x86/perf/amd: Resolve race condition when disabling PMC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190315105004.GW5996@hirez.programming.kicks-ass.net>
Date:   Fri, 15 Mar 2019 11:50:04 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     "Lendacky, Thomas" <Thomas.Lendacky@....com>
Cc:     "x86@...nel.org" <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Namhyung Kim <namhyung@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Jiri Olsa <jolsa@...hat.com>
Subject: Re: [RFC PATCH 1/2] x86/perf/amd: Resolve race condition when
 disabling PMC

On Mon, Mar 11, 2019 at 04:48:44PM +0000, Lendacky, Thomas wrote:
> On AMD processors, the detection of an overflowed counter in the NMI
> handler relies on the current value of the counter. So, for example, to
> check for overflow on a 48 bit counter, bit 47 is checked to see if it
> is 1 (not overflowed) or 0 (overflowed).
> 
> There is currently a race condition present when disabling and then
> updating the PMC. Increased NMI latency in newer AMD processors makes this
> race condition more pronounced.

Increased NMI latency also makes the results less useful :/ What amount
of skid are we talking about, and is there anything AMD is going to do
about this?

> If the counter value has overflowed, it is
> possible to update the PMC value before the NMI handler can run.

Arguably the WRMSR should sync against the PMI. That is the beahviour
one would expect.

Isn't that something you can fix in ucode? And could you very please
tell the hardware people this is disguisting?

> The updated PMC value is not an overflowed value, so when the perf NMI
> handler does run, it will not find an overflowed counter. This may
> appear as an unknown NMI resulting in either a panic or a series of
> messages, depending on how the kernel is configured.
> 
> To eliminate this race condition, the PMC value must be checked after
> disabling the counter in x86_pmu_disable_all(), and, if overflowed, must
> wait for the NMI handler to reset the value before continuing. Add a new,
> optional, callable function that can be used to test for and resolve this
> condition.
> 
> Cc: <stable@...r.kernel.org> # 4.14.x-
> Signed-off-by: Tom Lendacky <thomas.lendacky@....com>

> +static void amd_pmu_wait_on_overflow(int idx, u64 config)
> +{
> +	unsigned int i;
> +	u64 counter;
> +
> +	/*
> +	 * We shouldn't be calling this from NMI context, but add a safeguard
> +	 * here to return, since if we're in NMI context we can't wait for an
> +	 * NMI to reset an overflowed counter value.
> +	 */
> +	if (in_nmi())
> +		return;
> +
> +	/*
> +	 * If the interrupt isn't enabled then we won't get the NMI that will
> +	 * reset the overflow condition, so return.
> +	 */
> +	if (!(config & ARCH_PERFMON_EVENTSEL_INT))
> +		return;
> +
> +	/*
> +	 * Wait for the counter to be reset if it has overflowed. This loop
> +	 * should exit very, very quickly, but just in case, don't wait
> +	 * forever...
> +	 */
> +	for (i = 0; i < OVERFLOW_WAIT_COUNT; i++) {
> +		rdmsrl(x86_pmu_event_addr(idx), counter);
> +		if (counter & (1ULL << (x86_pmu.cntval_bits - 1)))
> +			break;
> +
> +		/* Might be in IRQ context, so can't sleep */
> +		udelay(1);
> +	}
> +}

Argh.. that's horrible, as I'm sure you fully appreciate :/

> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index b684f0294f35..f1d2f70000cd 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -606,6 +606,9 @@ void x86_pmu_disable_all(void)
>  			continue;
>  		val &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
>  		wrmsrl(x86_pmu_config_addr(idx), val);
> +
> +		if (x86_pmu.wait_on_overflow)
> +			x86_pmu.wait_on_overflow(idx, val);
>  	}
>  }

One alternative is adding amd_pmu_disable_all() to amd/core.c and using
that. Then you can also change the loop to do the wait after all the
WRMSRs, if that helps with latency.