linux-kernel - Re: [PATCH v3 0/6] Static calls

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <F82AE531-5E8F-4F70-B700-861DFC66B80A@vmware.com>
Date:   Fri, 11 Jan 2019 19:17:20 +0000
From:   Nadav Amit <namit@...are.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
CC:     Josh Poimboeuf <jpoimboe@...hat.com>,
        Andy Lutomirski <luto@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        the arch/x86 maintainers <x86@...nel.org>,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        Ard Biesheuvel <ard.biesheuvel@...aro.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Jason Baron <jbaron@...mai.com>, Jiri Kosina <jkosina@...e.cz>,
        David Laight <David.Laight@...lab.com>,
        Borislav Petkov <bp@...en8.de>,
        Julia Cartwright <julia@...com>, Jessica Yu <jeyu@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>,
        Rasmus Villemoes <linux@...musvillemoes.dk>,
        Edward Cree <ecree@...arflare.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>
Subject: Re: [PATCH v3 0/6] Static calls

> On Jan 11, 2019, at 11:03 AM, Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> 
> On Fri, Jan 11, 2019 at 7:15 AM Josh Poimboeuf <jpoimboe@...hat.com> wrote:
>>> Now, in the int3 handler can you take the faulting RIP and search for it in
>>> the “static-calls” table, writing the RIP+5 (offset) into R10 (return
>>> address) and the target into R11. You make the int3 handler to divert the
>>> code execution by changing pt_regs->rip to point to a new function that does:
>>> 
>>>      push R10
>>>      jmp __x86_indirect_thunk_r11
>>> 
>>> And then you are done. No?
>> 
>> IIUC, that sounds pretty much like what Steven proposed:
>> 
>>  https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.kernel.org%2Fr%2F20181129122000.7fb4fb04%40gandalf.local.home&amp;data=02%7C01%7Cnamit%40vmware.com%7Cd90f2aee03854a8da5dd08d677f7866e%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636828302371417454&amp;sdata=KEGVjCZbl5z7cCQX7F0%2FOaWfU7l%2Bvd2YDwZMXuOhpGI%3D&amp;reserved=0
>> 
>> I liked the idea, BUT, how would it work for callee-saved PV ops?  In
>> that case there's only one clobbered register to work with (rax).
> 
> Actually, there's a much simpler model now that I think about it.
> 
> The BP fixup just fixes up %rip to to point to "bp_int3_handler".
> 
> And that's just a random text address set up by "text_poke_bp()".
> 
> So how about the static call rewriting simply do this:
> 
> - for each static call:
> 
> 1)   create a fixup code stub that does
> 
>        push $returnaddressforTHIScall
>        jmp targetforTHIScall
> 
> 2) do
> 
>        on_each_cpu(do_sync_core, NULL, 1);
> 
>     to make sure all CPU's see this generated code
> 
>  3) do
> 
>        text_poke_bp(addr, newcode, newlen, generatedcode);
> 
> Ta-daa! Done.
> 
> In fact, it turns out that even the extra "do_sync_core()" in #2 is
> unnecessary, because taking the BP will be serializing on the CPU that
> takes it, so we can skip it.
> 
> End result: the text_poke_bp() function will do the two do_sync_core
> IPI's that guarantee that by the time it returns, no other CPU is
> using the generated code any more, so it can be re-used for the next
> static call fixup.
> 
> Notice? No odd emulation, no need to adjust the stack in the BP
> handler, just the regular "return to a different IP".
> 
> Now, there is a nasty special case with that stub, though.
> 
> So nasty thing with the whole "generate a stub for each call" case:
> because it's dynamic and because of the re-use of the stub, you could
> be in the situation where:
> 
>  CPU1                  CPU2
>  ----                  ----
> 
>  generate a stub
>  on_each_cpu(do_sync_core..)
>  text_poke_bp()
>  ...
> 
>  rewrite to BP
>                        trigger the BP
>                        return to the stub
>                        fun the first instruction of the stub
>                        *INTERRUPT causes rescheduling*
> 
>  on_each_cpu(do_sync_core..)
>  rewrite to good instruction
>  on_each_cpu(do_sync_core..)
> 
>  free or re-generate the stub
> 
>                        !! The stub is still in use !!
> 
> So that simple "just generate the stub dynamically" isn't so simple after all.
> 
> But it turns out that that is really simple to handle too. How do we do that?
> 
> We do that by giving the BP handler *two* code sequences, and we make
> the BP handler pick one depending on whether it is returning to a
> "interrupts disabled" or "interrupts enabled" case.
> 
> So the BP handler does this:
> 
> - if we're returning with interrupts disabled, pick the simple stub
> 
> - if we're returning with interrupts enabled, clkear IF in the return
> %rflags, and pick a *slightly* more complex stub:
> 
>        push $returnaddressforTHIScall
>        sti
>        jmp targetforTHIScall
> 
> and now the STI shadow will mean that this sequence is uninterruptible.
> 
> So we'd not do complex emulation of the call instruction at BP time,
> but we'd do that *trivial* change at BP time.
> 
> This seems simple, doesn't need any temporary registers at all, and
> doesn't need any extra stack magic. It literally needs just a trivial
> sequence in poke_int3_handler().
> 
> The we'd change the end of poke_int3_handler() to do something like
> this instead:
> 
>        void *newip = bp_int3_handler;
>        ..
>        if (new == magic_static_call_bp_int3_handler) {
>                if (regs->flags &X86_FLAGS_IF) {
>                        newip = magic_static_call_bp_int3_handler_sti;
>                        regs->flags &= ~X86_FLAGS_IF;
>        }
>        regs->ip = (unsigned long) newip;
>        return 1;
> 
> AAND now we're *really* done.
> 
> Does anybody see any issues in this?

I think it’s a better articulated, more detailed version to the solution I
proposed in this thread (which also used a patched trampoline with STI+JMP).

So obviously I like it. ;-)