[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <8D8DF81C-3331-4105-8594-9600281010EF@amacapital.net>
Date: Thu, 7 Feb 2019 10:07:28 -0800
From: Andy Lutomirski <luto@...capital.net>
To: Peter Zijlstra <peterz@...radead.org>
Cc: "Luck, Tony" <tony.luck@...el.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Dan Williams <dan.j.williams@...el.com>,
Ingo Molnar <mingo@...nel.org>,
Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Andy Lutomirski <luto@...nel.org>,
Borislav Petkov <bp@...en8.de>,
Thomas Gleixner <tglx@...utronix.de>,
Rik van Riel <riel@...riel.com>
Subject: Re: [GIT PULL] x86/mm changes for v4.21
> On Feb 7, 2019, at 9:57 AM, Peter Zijlstra <peterz@...radead.org> wrote:
>
>> On Thu, Feb 07, 2019 at 09:36:00AM -0800, Luck, Tony wrote:
>>> On Thu, Feb 07, 2019 at 03:01:31PM +0100, Peter Zijlstra wrote:
>>>> On Thu, Feb 07, 2019 at 11:50:52AM +0000, Linus Torvalds wrote:
>>>> If you re-generate the canonical address in __cpa_addr(), now we'll
>>>> actually have the real virtual address around for a lot of code-paths
>>>> (pte lookup etc), which was what people wanted to avoid in the first
>>>> place.
>>>
>>> Note that it's an 'unsigned long' address, not an actual pointer, and
>>> (afaict) non of the code paths use it as a pointer. This _should_ avoid
>>> the CPU from following said pointer and doing a deref on it.
>>
>> The type doesn't matter. You want to avoid having the
>> true value in the register as long as possible. Ideal
>> spot would be the instruction before the TLB is flushed.
>>
>> The speculative issue is that any branch you encounter
>> while you have the address in a register may be mispredicted.
>> You might also get a bogus hit in the branch target cache
>> and speculatively jump into the weeds. While there you
>> could find an instruction that loads using that register, and
>> even though it is speculative and the instruction won't
>> retire, a machine check log will be created in a bank (no
>> machine check is signalled).
>>
>> Once the TLB is updated, you are safe. A speculative
>> access to an uncached address will not load or log anything.
>
> Something like so then? AFAICT CLFLUSH will also #GP if feed it crap.
>
>
> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> index 4f8972311a77..d3ae92ad72a6 100644
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -230,6 +230,28 @@ static bool __cpa_pfn_in_highmap(unsigned long pfn)
>
> #endif
>
> +/*
> + * Machine check recovery code needs to change cache mode of poisoned
> + * pages to UC to avoid speculative access logging another error. But
> + * passing the address of the 1:1 mapping to set_memory_uc() is a fine
> + * way to encourage a speculative access. So we cheat and flip the top
> + * bit of the address. This works fine for the code that updates the
> + * page tables. But at the end of the process we need to flush the cache
> + * and the non-canonical address causes a #GP fault when used by the
> + * CLFLUSH instruction.
> + *
> + * But in the common case we already have a canonical address. This code
> + * will fix the top bit if needed and is a no-op otherwise.
Joining this thread late...
This is all IMO rather crazy. How about we fiddle with CR0 to turn off the cache, then fiddle with page tables, then turn caching on? Or, heck, see if there’s some chicken bit we can set to improve the situation while we’re in the MCE handler.
Also, since I don’t really want to dig into the code to answer this, how exactly do we do a broadcast TLB flush from MCE context? We’re super-duper-atomic, and locks might be held on various CPUs. Shouldn’t we be telling the cpa code to skip the flush and then just have the MCE code do a full flush manually? The MCE code has already taken over all CPUs on non-LMCE systems.
Or, better yet, get Intel to fix the hardware. A failed speculative access while already in MCE context should just be ignored.
Powered by blists - more mailing lists