linux-kernel - Re: [PATCH 4/6] Unsuck "x86/entry/64: Create a percpu SYSCALL entry trampoline"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CALCETrUt6vfCwWSJ7zOLuCJVC-NNK8n2h2XEGj2_=+JL2FDLkg@mail.gmail.com>
Date:   Sat, 2 Dec 2017 08:05:22 -0800
From:   Andy Lutomirski <luto@...nel.org>
To:     Josh Poimboeuf <jpoimboe@...hat.com>
Cc:     Andy Lutomirski <luto@...nel.org>, X86 ML <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Borislav Petkov <bp@...en8.de>,
        Brian Gerst <brgerst@...il.com>,
        David Laight <David.Laight@...lab.com>,
        Kees Cook <keescook@...omium.org>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH 4/6] Unsuck "x86/entry/64: Create a percpu SYSCALL entry trampoline"

On Sat, Dec 2, 2017 at 7:18 AM, Josh Poimboeuf <jpoimboe@...hat.com> wrote:
> On Thu, Nov 30, 2017 at 10:29:44PM -0800, Andy Lutomirski wrote:
>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>> index caf74a1bb3de..28f4e7553c26 100644
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -180,14 +180,24 @@ ENTRY(entry_SYSCALL_64_trampoline)
>>
>>       /*
>>        * x86 lacks a near absolute jump, and we can't jump to the real
>> -      * entry text with a relative jump, so we fake it using retq.
>> +      * entry text with a relative jump.  We could push the target
>> +      * address and then use retq, but this destroys the pipeline on
>> +      * many CPUs (wasting over 20 cycles on Sandy Bridge).  Instead,
>> +      * spill RDI and restore it in a second-stage trampoline.
>>        */
>> -     pushq   $entry_SYSCALL_64_after_hwframe
>> -     retq
>> +     pushq   %rdi
>> +     movq    $entry_SYSCALL_64_stage2, %rdi
>> +     jmp     *%rdi
>>  END(entry_SYSCALL_64_trampoline)
>>
>>       .popsection
>>
>> +ENTRY(entry_SYSCALL_64_stage2)
>> +     UNWIND_HINT_EMPTY
>> +     popq    %rdi
>> +     jmp     entry_SYSCALL_64_after_hwframe
>> +END(entry_SYSCALL_64_stage2)
>> +
>>  ENTRY(entry_SYSCALL_64)
>>       UNWIND_HINT_EMPTY
>>       /*
>
> Another crazy idea:
>
>         call    1f
> 1:      movq    $entry_SYSCALL_64_after_hwframe, (%rsp)
>         ret
>
> Does that fix the regression?

I suspect that's as bad or worse.  The issue (I think) is that the CPU
has a little invisible internal stack that tracks calls and rets and
the CPU will speculate past a ret under the assumption that it returns
to the last call on the stack.  If it doesn't, then the CPU has to
start over.