[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CALCETrUt6vfCwWSJ7zOLuCJVC-NNK8n2h2XEGj2_=+JL2FDLkg@mail.gmail.com>
Date: Sat, 2 Dec 2017 08:05:22 -0800
From: Andy Lutomirski <luto@...nel.org>
To: Josh Poimboeuf <jpoimboe@...hat.com>
Cc: Andy Lutomirski <luto@...nel.org>, X86 ML <x86@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Borislav Petkov <bp@...en8.de>,
Brian Gerst <brgerst@...il.com>,
David Laight <David.Laight@...lab.com>,
Kees Cook <keescook@...omium.org>,
Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH 4/6] Unsuck "x86/entry/64: Create a percpu SYSCALL entry trampoline"
On Sat, Dec 2, 2017 at 7:18 AM, Josh Poimboeuf <jpoimboe@...hat.com> wrote:
> On Thu, Nov 30, 2017 at 10:29:44PM -0800, Andy Lutomirski wrote:
>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>> index caf74a1bb3de..28f4e7553c26 100644
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -180,14 +180,24 @@ ENTRY(entry_SYSCALL_64_trampoline)
>>
>> /*
>> * x86 lacks a near absolute jump, and we can't jump to the real
>> - * entry text with a relative jump, so we fake it using retq.
>> + * entry text with a relative jump. We could push the target
>> + * address and then use retq, but this destroys the pipeline on
>> + * many CPUs (wasting over 20 cycles on Sandy Bridge). Instead,
>> + * spill RDI and restore it in a second-stage trampoline.
>> */
>> - pushq $entry_SYSCALL_64_after_hwframe
>> - retq
>> + pushq %rdi
>> + movq $entry_SYSCALL_64_stage2, %rdi
>> + jmp *%rdi
>> END(entry_SYSCALL_64_trampoline)
>>
>> .popsection
>>
>> +ENTRY(entry_SYSCALL_64_stage2)
>> + UNWIND_HINT_EMPTY
>> + popq %rdi
>> + jmp entry_SYSCALL_64_after_hwframe
>> +END(entry_SYSCALL_64_stage2)
>> +
>> ENTRY(entry_SYSCALL_64)
>> UNWIND_HINT_EMPTY
>> /*
>
> Another crazy idea:
>
> call 1f
> 1: movq $entry_SYSCALL_64_after_hwframe, (%rsp)
> ret
>
> Does that fix the regression?
I suspect that's as bad or worse. The issue (I think) is that the CPU
has a little invisible internal stack that tracks calls and rets and
the CPU will speculate past a ret under the assumption that it returns
to the last call on the stack. If it doesn't, then the CPU has to
start over.
Powered by blists - more mailing lists