linux-kernel - Re: [PATCH v2 6/8] x86/entry: add unwind hint annotations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <F1E11F3A-039D-48B2-A57D-7881E93028DD@amacapital.net>
Date:   Thu, 29 Jun 2017 15:59:04 -0700
From:   Andy Lutomirski <luto@...capital.net>
To:     Josh Poimboeuf <jpoimboe@...hat.com>
Cc:     Andy Lutomirski <luto@...nel.org>, X86 ML <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        live-patching@...r.kernel.org,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Jiri Slaby <jslaby@...e.cz>, Ingo Molnar <mingo@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Mike Galbraith <efault@....de>
Subject: Re: [PATCH v2 6/8] x86/entry: add unwind hint annotations



--Andy

> On Jun 29, 2017, at 2:41 PM, Josh Poimboeuf <jpoimboe@...hat.com> wrote:
> 
>> On Thu, Jun 29, 2017 at 02:09:54PM -0700, Andy Lutomirski wrote:
>>> On Thu, Jun 29, 2017 at 12:05 PM, Josh Poimboeuf <jpoimboe@...hat.com> wrote:
>>>> On Thu, Jun 29, 2017 at 11:50:18AM -0700, Andy Lutomirski wrote:
>>>>> On Thu, Jun 29, 2017 at 10:53 AM, Josh Poimboeuf <jpoimboe@...hat.com> wrote:
>>>>> There's a bug here that will need a small change to the entry code.
>>>>> 
>>>>> Mike Galbraith reported:
>>>>> 
>>>>>  WARNING: can't dereference registers at ffffc900089d7e08 for ip ffffffff81740bbb
>>>>> 
>>>>> After some looking I found that it's caused by the following code
>>>>> snippet in the 'interrupt' macro in entry_64.S:
>>>>> 
>>>>>        /*
>>>>>         * Save previous stack pointer, optionally switch to interrupt stack.
>>>>>         * irq_count is used to check if a CPU is already on an interrupt stack
>>>>>         * or not. While this is essentially redundant with preempt_count it is
>>>>>         * a little cheaper to use a separate counter in the PDA (short of
>>>>>         * moving irq_enter into assembly, which would be too much work)
>>>>>         */
>>>>>        movq    %rsp, %rdi
>>>>>        incl    PER_CPU_VAR(irq_count)
>>>>>        cmovzq  PER_CPU_VAR(irq_stack_ptr), %rsp
>>>>>        UNWIND_HINT_REGS base=rdi
>>>>>        pushq   %rdi
>>>>>        UNWIND_HINT_REGS indirect=1
>>>>> 
>>>>> The problem is that it's changing the stack pointer *before* writing the
>>>>> previous stack pointer (push %rdi).  So when unwinding from an NMI which
>>>>> hit between the rsp write and the rdi push, the unwinder tries to access
>>>>> the regs on the previous stack (by reading rdi), but the previous stack
>>>>> pointer isn't there yet, so the access is considered out of bounds.
>>>> 
>>>> Ugh, that code.  Does this problem go away with this patch applied:
>>>> 
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/entry_ist&id=2231ec7e0bcc1a2bc94a17081511ab54cc6badd1
>>>> 
>>>> If so, want to update the patch for new kernels (shouldn't conflict
>>>> with anything except your unwind hints)?
>>> 
>>> I don't think that patch will fix it, because it still updates rsp
>>> *before* writing the old rsp on the new stack.  So there's still a
>>> window where the "previous stack" pointer is missing.
>> 
>> But it's in a register.  Is undwarf not able to grok that?
> 
> Sorry, I didn't explain it very well.  Undwarf can find the regs pointer
> in rdi, it just doesn't trust its value.
> 
> See the stack_info.next_sp field, which is set in in_irq_stack():
> 
>    /*
>     * The next stack pointer is the first thing pushed by the entry code
>     * after switching to the irq stack.
>     */
>    info->next_sp = (unsigned long *)*(end - 1);
> 
> It's a safety mechanism.  The unwinder needs the last word of the irq
> stack page to point to the previous stack.  That way it can double check
> that the stack pointer it calculates is within the bounds of either the
> current stack or the previous stack.
> 
> In the above code, the previous stack pointer (or next stack pointer,
> depending on your perspective) hasn't been set up before it switches
> stacks.  So the unwinder reads an uninitialized value into
> info->next_sp, and compares that with the regs pointer, and then stops
> the unwind because it thinks it went off into the weeds.
> 

That should be manageable, though, I think.  With my patch applied (and maybe even without it), the only exception to that rule is if regs->sp points just above the top of the IRQ stack and the next instruction is push reg.  In that case, the reg is exactly as trustworthy as the normal rule.*  Can you teach the unwinding code that this is okay?

* If an NMI hits right there, then it relies on unwinding out of the NMI correctly.  But the usual checks that the target stack is a valid stack should prevent us from going off into the weeds regardless.

> -- 
> Josh