linux-kernel - Re: BUG: unable to handle kernel paging request in __switch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrVRq9OKYu+rbvKeuY+pD14X8etW3hzVxJqznzH9T_PvMg@mail.gmail.com>
Date:   Thu, 14 Dec 2017 10:54:46 -0800
From:   Andy Lutomirski <luto@...nel.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        syzbot 
        <bot+1f445b1009b8eeededa30fe62ccf685f2ec9d155@...kaller.appspotmail.com>,
        Borislav Petkov <bp@...e.de>,
        Dmitry Safonov <dsafonov@...tuozzo.com>,
        Peter Anvin <hpa@...or.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Andrew Lutomirski <luto@...nel.org>,
        Kyle Huey <me@...ehuey.com>, Ingo Molnar <mingo@...hat.com>,
        syzkaller-bugs@...glegroups.com,
        "the arch/x86 maintainers" <x86@...nel.org>
Subject: Re: BUG: unable to handle kernel paging request in __switch_to

On Thu, Dec 14, 2017 at 10:42 AM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
> On Thu, Dec 14, 2017 at 9:12 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
>> On Sun, 3 Dec 2017, syzbot wrote:
>>> BUG: unable to handle kernel paging request at fffffffffffffff8
>>> Oops: 0002 [#1] SMP KASAN
>
> System write of a non-existent page.
>
>>> RIP: 0010:switch_fpu_prepare arch/x86/include/asm/fpu/internal.h:535 [inline]
>>> RIP: 0010:__switch_to+0x95b/0x1330 arch/x86/kernel/process_64.c:407
>
> This says it's
>
>      old_fpu->last_cpu = cpu;
>
> and the code disassembly ends up looking something like this:
>
>    0: 48 c1 ea 03          shr    $0x3,%rdx
>    4: 0f b6 04 02          movzbl (%rdx,%rax,1),%eax
>    8: 84 c0                test   %al,%al
>    a: 74 08                je     0x14
>    c: 3c 03                cmp    $0x3,%al
>    e: 0f 8e d5 06 00 00    jle    0x6e9
>   14: 8b 85 70 fe ff ff    mov    -0x190(%rbp),%eax
>   1a: 41 89 84 24 c0 15 00 mov    %eax,0x15c0(%r12)
>   21: 00
>   22:* cc                    int3    <-- trapping instruction
>
> where that preceding two "mov" instructions look like it might indeed be that
>
>      old_fpu->last_cpu = cpu;
>
> thing, and the register state doesn't look insane for this.
>
> So I think the RIP->line encoding is slightly off, and that "int3" is
> almost certainly due to the very next thing after the write:
>
>                 trace_x86_fpu_regs_deactivated(old_fpu);
>
> and that actually makes sense if the test robot is doing some tracing,
> particularly if it's just about to _start_ tracing, and it has
> replaced the first byte of the instruction with 'int3' and is in the
> process of doing the rewrite.
>
> The fact that it then takes a system write fault is because some GDT
> or IDT setup is screwed up. Or possibly the stack is screwed up and
> started out as 0, and then the push to the stack would decrement the
> stack pointer and try to push the error state or something.
>
>> That's the second report I'm staring at today which has CR2
>> fffffffffffffffx and points to a faulting instruction which does not make
>> any sense at all.
>
> That actually does make sense - see above.  It just requires that race
> with the instruction rewriting.
>
> *Normally* we never actually take the "int3" exception, because
> normally we'll have completed the rewrite before another CPU actually
> executes the instruction that is being rewritten.
>
> So I'm assuming this is with the page table isolation, and some
> unusual case in exception handling got screwed up.

SDM time.  Assuming the CPU actually decoded int3 and tried to execute
it, I can see a couple possible outcomes:

1. Something's wrong with the IDT and it can't read the vector.  I
think this would end up triple-faulting, though.

2. It actually tries to handle the breakpoint.  A breakpoint is a
benign exception, so any exception encountered while delivering it
would result in serial delivery.  I've never thought that serial
delivery made any sense -- presumably it just cancels the breakpoint
and delivers the other exception.  So this *could* be a page fault hit
during delivery of the int3 exception.  I don't believe it's a GDT
problem, though, because that would also likely lead to a triple
fault.  What I *would* believe is that the IST table got messed up and
we're seeing the result of trying to push to the stack with the
initial RSP=0 so the fault hits at address -8.

I have no idea how that would happen, though.  Especially since int3
from userspace would have exactly the same problem, and we exercise
that code in the selftests.