linux-kernel - Re: [PATCH 07/39] x86/entry/32: Enter the kernel via trampoline stack

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrW4XMD9TSTxK3h-3p5ZE5Z=DupiUBtiXnMmSprbXtJr3g@mail.gmail.com>
Date:   Fri, 13 Jul 2018 10:21:39 -0700
From:   Andy Lutomirski <luto@...capital.net>
To:     Joerg Roedel <joro@...tes.org>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>,
        "H . Peter Anvin" <hpa@...or.com>, X86 ML <x86@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Andy Lutomirski <luto@...nel.org>,
        Dave Hansen <dave.hansen@...el.com>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Juergen Gross <jgross@...e.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Borislav Petkov <bp@...en8.de>, Jiri Kosina <jkosina@...e.cz>,
        Boris Ostrovsky <boris.ostrovsky@...cle.com>,
        Brian Gerst <brgerst@...il.com>,
        David Laight <David.Laight@...lab.com>,
        Denys Vlasenko <dvlasenk@...hat.com>,
        Eduardo Valentin <eduval@...zon.com>,
        Greg KH <gregkh@...uxfoundation.org>,
        Will Deacon <will.deacon@....com>,
        "Liguori, Anthony" <aliguori@...zon.com>,
        Daniel Gruss <daniel.gruss@...k.tugraz.at>,
        Hugh Dickins <hughd@...gle.com>,
        Kees Cook <keescook@...gle.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Waiman Long <llong@...hat.com>, Pavel Machek <pavel@....cz>,
        "David H . Gutteridge" <dhgutteridge@...patico.ca>,
        Joerg Roedel <jroedel@...e.de>
Subject: Re: [PATCH 07/39] x86/entry/32: Enter the kernel via trampoline stack

On Fri, Jul 13, 2018 at 3:56 AM, Joerg Roedel <joro@...tes.org> wrote:
> Hi Andy,
>
> thanks for you valuable feedback.
>
> On Thu, Jul 12, 2018 at 02:09:45PM -0700, Andy Lutomirski wrote:
>> > On Jul 11, 2018, at 4:29 AM, Joerg Roedel <joro@...tes.org> wrote:
>> > -.macro SAVE_ALL pt_regs_ax=%eax
>> > +.macro SAVE_ALL pt_regs_ax=%eax switch_stacks=0
>> >    cld
>> > +    /* Push segment registers and %eax */
>> >    PUSH_GS
>> >    pushl    %fs
>> >    pushl    %es
>> >    pushl    %ds
>> >    pushl    \pt_regs_ax
>> > +
>> > +    /* Load kernel segments */
>> > +    movl    $(__USER_DS), %eax
>>
>> If \pt_regs_ax != %eax, then this will behave oddly. Maybe it’s okay.
>> But I don’t see why this change was needed at all.
>
> This is a left-over from a previous approach I tried and then abandoned
> later. You are right, it is not needed.
>
>> > +/*
>> > + * Called with pt_regs fully populated and kernel segments loaded,
>> > + * so we can access PER_CPU and use the integer registers.
>> > + *
>> > + * We need to be very careful here with the %esp switch, because an NMI
>> > + * can happen everywhere. If the NMI handler finds itself on the
>> > + * entry-stack, it will overwrite the task-stack and everything we
>> > + * copied there. So allocate the stack-frame on the task-stack and
>> > + * switch to it before we do any copying.
>>
>> Ick, right. Same with machine check, though. You could alternatively
>> fix it by running NMIs on an irq stack if the irq count is zero.  How
>> confident are you that you got #MC right?
>
> Pretty confident, #MC uses the exception entry path which also handles
> entry-stack and user-cr3 correctly. It might go through through the slow
> paranoid exit path, but that's okay for #MC I guess.
>
> And when the #MC happens while we switch to the task stack and do the
> copying the same precautions as for NMI apply.
>
>> > + */
>> > +.macro SWITCH_TO_KERNEL_STACK
>> > +
>> > +    ALTERNATIVE     "", "jmp .Lend_\@", X86_FEATURE_XENPV
>> > +
>> > +    /* Are we on the entry stack? Bail out if not! */
>> > +    movl    PER_CPU_VAR(cpu_entry_area), %edi
>> > +    addl    $CPU_ENTRY_AREA_entry_stack, %edi
>> > +    cmpl    %esp, %edi
>> > +    jae    .Lend_\@
>>
>> That’s an alarming assumption about the address space layout. How
>> about an xor and an and instead of cmpl?  As it stands, if the address
>> layout ever changes, the failure may be rather subtle.
>
> Right, I implement a more restrictive check.

But the check needs to be correct or we'll mess up, right?  I think
the code will be much more robust and easier to review if you check
"on the entry stack" instead of ">= the entry stack".  (Or <= -- I can
never remember how this works in AT&T syntax.)

>
>> Anyway, wouldn’t it be easier to solve this by just not switching
>> stacks on entries from kernel mode and making the entry stack bigger?
>> Stick an assertion in the scheduling code that we’re not on an entry
>> stack, perhaps.
>
> That'll save us the check whether we are on the entry stack and replace
> it with a check whether we are coming from user/vm86 mode. I don't think
> that this will simplify things much and I am a bit afraid that it'll
> break unwritten assumptions elsewhere. It is probably something we can
> look into later separatly from the basic pti-x32 enablement.
>

Fair enough.  There's also the issue that NMI still has to switch CR3
if it hits with the wrong CR3.

I personally much prefer checking whether you came from user mode
rather than the stack address, but I'm okay with either approach here.