linux-kernel - Re: [RFC PATCH 1/2] x86/perf/amd: Resolve race condition when disabling PMC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <41b30e9d-dcfe-ccae-8cd4-e88df8e4cb9c@amd.com>
Date:   Fri, 15 Mar 2019 14:06:52 +0000
From:   "Lendacky, Thomas" <Thomas.Lendacky@....com>
To:     Peter Zijlstra <peterz@...radead.org>
CC:     "x86@...nel.org" <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Namhyung Kim <namhyung@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Jiri Olsa <jolsa@...hat.com>
Subject: Re: [RFC PATCH 1/2] x86/perf/amd: Resolve race condition when
 disabling PMC

On 3/15/19 5:50 AM, Peter Zijlstra wrote:
> On Mon, Mar 11, 2019 at 04:48:44PM +0000, Lendacky, Thomas wrote:
>> On AMD processors, the detection of an overflowed counter in the NMI
>> handler relies on the current value of the counter. So, for example, to
>> check for overflow on a 48 bit counter, bit 47 is checked to see if it
>> is 1 (not overflowed) or 0 (overflowed).
>>
>> There is currently a race condition present when disabling and then
>> updating the PMC. Increased NMI latency in newer AMD processors makes this
>> race condition more pronounced.
> 
> Increased NMI latency also makes the results less useful :/ What amount
> of skid are we talking about, and is there anything AMD is going to do
> about this?

I haven't looked into the amount of skid, but, yes, the hardware team is
looking at this.

> 
>> If the counter value has overflowed, it is
>> possible to update the PMC value before the NMI handler can run.
> 
> Arguably the WRMSR should sync against the PMI. That is the beahviour
> one would expect.
> 
> Isn't that something you can fix in ucode? And could you very please
> tell the hardware people this is disguisting?

Currently, there's nothing they've found that can be done in ucode for
this. But, yes, they know it's a problem and they're looking at what they
can do.

> 
>> The updated PMC value is not an overflowed value, so when the perf NMI
>> handler does run, it will not find an overflowed counter. This may
>> appear as an unknown NMI resulting in either a panic or a series of
>> messages, depending on how the kernel is configured.
>>
>> To eliminate this race condition, the PMC value must be checked after
>> disabling the counter in x86_pmu_disable_all(), and, if overflowed, must
>> wait for the NMI handler to reset the value before continuing. Add a new,
>> optional, callable function that can be used to test for and resolve this
>> condition.
>>
>> Cc: <stable@...r.kernel.org> # 4.14.x-
>> Signed-off-by: Tom Lendacky <thomas.lendacky@....com>
> 
>> +static void amd_pmu_wait_on_overflow(int idx, u64 config)
>> +{
>> +	unsigned int i;
>> +	u64 counter;
>> +
>> +	/*
>> +	 * We shouldn't be calling this from NMI context, but add a safeguard
>> +	 * here to return, since if we're in NMI context we can't wait for an
>> +	 * NMI to reset an overflowed counter value.
>> +	 */
>> +	if (in_nmi())
>> +		return;
>> +
>> +	/*
>> +	 * If the interrupt isn't enabled then we won't get the NMI that will
>> +	 * reset the overflow condition, so return.
>> +	 */
>> +	if (!(config & ARCH_PERFMON_EVENTSEL_INT))
>> +		return;
>> +
>> +	/*
>> +	 * Wait for the counter to be reset if it has overflowed. This loop
>> +	 * should exit very, very quickly, but just in case, don't wait
>> +	 * forever...
>> +	 */
>> +	for (i = 0; i < OVERFLOW_WAIT_COUNT; i++) {
>> +		rdmsrl(x86_pmu_event_addr(idx), counter);
>> +		if (counter & (1ULL << (x86_pmu.cntval_bits - 1)))
>> +			break;
>> +
>> +		/* Might be in IRQ context, so can't sleep */
>> +		udelay(1);
>> +	}
>> +}
> 
> Argh.. that's horrible, as I'm sure you fully appreciate :/

Yeah...

> 
>> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
>> index b684f0294f35..f1d2f70000cd 100644
>> --- a/arch/x86/events/core.c
>> +++ b/arch/x86/events/core.c
>> @@ -606,6 +606,9 @@ void x86_pmu_disable_all(void)
>>   			continue;
>>   		val &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
>>   		wrmsrl(x86_pmu_config_addr(idx), val);
>> +
>> +		if (x86_pmu.wait_on_overflow)
>> +			x86_pmu.wait_on_overflow(idx, val);
>>   	}
>>   }
> 
> One alternative is adding amd_pmu_disable_all() to amd/core.c and using
> that. Then you can also change the loop to do the wait after all the
> WRMSRs, if that helps with latency.

I thought about that for both this and the next patch. But since it would
be duplicating all the code I went with the added callbacks. It might be
worth it for this patch to have an AMD disable_all callback since it's
not a lot of code to duplicate compared to handle_irq and I like the idea
of doing the overflow check after all the WRMSRs.

Thanks for the review, Peter.

Tom

>