linux-kernel - Re: [PATCH v2] x86/dumpstack: don't dump kernel memory based on usermode RIP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG48ez3N5wBUEDCDaSzWOWjs5rYrQ4QgNTYT=PmYmFQAcqfA-w@mail.gmail.com>
Date:   Tue, 28 Aug 2018 18:29:43 +0200
From:   Jann Horn <jannh@...gle.com>
To:     bp@...e.de
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "H . Peter Anvin" <hpa@...or.com>,
        "the arch/x86 maintainers" <x86@...nel.org>,
        Andy Lutomirski <luto@...nel.org>,
        kernel list <linux-kernel@...r.kernel.org>,
        Kees Cook <keescook@...omium.org>, security@...nel.org
Subject: Re: [PATCH v2] x86/dumpstack: don't dump kernel memory based on
 usermode RIP

On Tue, Aug 28, 2018 at 6:25 PM Borislav Petkov <bp@...e.de> wrote:
>
> On Tue, Aug 28, 2018 at 05:49:01PM +0200, Jann Horn wrote:
> > show_opcodes() is used both for dumping kernel instructions and for dumping
> > user instructions. If userspace causes #PF by jumping to a kernel address,
> > show_opcodes() can be reached with regs->ip controlled by the user,
> > pointing to kernel code.
>
> Yap, and people keep asking how to dump the running kernel, after
> patching and jump labels and stuff... Here's how!
>
> :-))))
>
> > Make sure that userspace can't trick us into
> > dumping kernel memory into dmesg.
> >
> > Cc: stable@...r.kernel.org
> > Fixes: 7cccf0725cf7 ("x86/dumpstack: Add a show_ip() function")
>
> I think this one is more likely:
>
>   ba54d856a9d8 ("x86/fault: Dump user opcode bytes on fatal faults")
>
> as it added the dumping of user opcode bytes.

No, you can also get user opcode bytes printed by WARN() and friends.
When you add a WARN() in the pagefault handler, you get something like
this. The first "Code:" line is from ba54d856a9d8, but the second one
further down is from before that.

[  125.564041] segfault[1602]: segfault at ffffffff854340c0 ip
ffffffff854340c0 sp 00007ffd4cc7a568 error 15
[  125.569923] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 <63> 6f 72 65 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00
[  125.576859] ------------[ cut here ]------------
[  125.578406] TESTING WARN()
[  125.578439] WARNING: CPU: 6 PID: 1602 at arch/x86/mm/fault.c:894
__bad_area_nosemaphore+0x147/0x270
[  125.582172] Modules linked in: bpfilter
[  125.583394] CPU: 6 PID: 1602 Comm: segfault Tainted: G        W
    4.18.0+ #108
[  125.585811] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.10.2-1 04/01/2014
[  125.588410] RIP: 0010:__bad_area_nosemaphore+0x147/0x270
[  125.590078] Code: 48 89 d9 48 89 ea 44 89 e6 48 c7 83 30 0b 00 00
0e 00 00 00 bf 0b 00 00 00 e8 f5 eb ff ff 48 c7 c7 00 61 66 84 e8 79
11 05 00 <0f> 0b 48 83 c4 28 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 83 c4
28 4c
[  125.595779] RSP: 0018:ffff8801cb3b7e18 EFLAGS: 00010286
[  125.597426] RAX: 0000000000000000 RBX: ffff8801cbb9e000 RCX: 0000000000000000
[  125.599605] RDX: 0000000000000001 RSI: dffffc0000000000 RDI: ffffffff86678ea0
[  125.601800] RBP: ffffffff854340c0 R08: ffffed003d873ed5 R09: ffffed003d873ed5
[  125.603935] R10: 0000000000000001 R11: ffffed003d873ed4 R12: 0000000000000001
[  125.606113] R13: 0000000000000000 R14: 0000000000000015 R15: ffff8801cb3b7f58
[  125.608250] FS:  00007fe30d518700(0000) GS:ffff8801ec380000(0000)
knlGS:0000000000000000
[  125.610608] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  125.612331] CR2: ffffffff854340c0 CR3: 00000001d563e001 CR4: 00000000003606e0
[  125.614470] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  125.616607] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  125.618736] Call Trace:
[  125.619475]  __do_page_fault+0x133/0x780
[  125.620646]  ? mm_fault_error+0x1b0/0x1b0
[  125.622236]  ? async_page_fault+0x8/0x30
[  125.623388]  async_page_fault+0x1e/0x30
[  125.624526] RIP: 0033:core_pattern+0x0/0x880
[  125.625786] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 <63> 6f 72 65 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00
[  125.631208] RSP: 002b:00007ffd4cc7a568 EFLAGS: 00010202
[  125.632737] RAX: ffffffff854340c0 RBX: 0000000000000000 RCX: 0000000000000000
[  125.635039] RDX: 00007ffd4cc7a678 RSI: 00007ffd4cc7a668 RDI: 0000000000000001
[  125.637088] RBP: 00007ffd4cc7a580 R08: 0000562d395106f0 R09: 00007fe30d323cb0
[  125.639153] R10: 0000000000000000 R11: 00007fe30d0d23c0 R12: 0000562d39510530
[  125.641183] R13: 00007ffd4cc7a660 R14: 0000000000000000 R15: 0000000000000000
[  125.643221] ---[ end trace fb20716f9d6369bd ]---

> > Reviewed-by: Kees Cook <keescook@...omium.org>
> > Signed-off-by: Jann Horn <jannh@...gle.com>
> > ---
> > v2: Andy pointed out that I probably shouldn't be doing wrapping
> > arithmetic on pointers.
> >
> >  arch/x86/include/asm/stacktrace.h |  2 +-
> >  arch/x86/kernel/dumpstack.c       | 13 ++++++++++---
> >  arch/x86/mm/fault.c               |  2 +-
> >  3 files changed, 12 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
> > index b6dc698f992a..f335aad404a4 100644
> > --- a/arch/x86/include/asm/stacktrace.h
> > +++ b/arch/x86/include/asm/stacktrace.h
> > @@ -111,6 +111,6 @@ static inline unsigned long caller_frame_pointer(void)
> >       return (unsigned long)frame;
> >  }
> >
> > -void show_opcodes(u8 *rip, const char *loglvl);
> > +void show_opcodes(struct pt_regs *regs, const char *loglvl);
> >  void show_ip(struct pt_regs *regs, const char *loglvl);
> >  #endif /* _ASM_X86_STACKTRACE_H */
> > diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
> > index 9c8652974f8e..14b337582b6f 100644
> > --- a/arch/x86/kernel/dumpstack.c
> > +++ b/arch/x86/kernel/dumpstack.c
> > @@ -89,14 +89,21 @@ static void printk_stack_address(unsigned long address, int reliable,
> >   * Thus, the 2/3rds prologue and 64 byte OPCODE_BUFSIZE is just a random
> >   * guesstimate in attempt to achieve all of the above.
> >   */
> > -void show_opcodes(u8 *rip, const char *loglvl)
> > +void show_opcodes(struct pt_regs *regs, const char *loglvl)
> >  {
> >  #define PROLOGUE_SIZE 42
> >  #define EPILOGUE_SIZE 21
> >  #define OPCODE_BUFSIZE (PROLOGUE_SIZE + 1 + EPILOGUE_SIZE)
> >       u8 opcodes[OPCODE_BUFSIZE];
> > +     u8 *prologue = (u8 *)(regs->ip - PROLOGUE_SIZE);
> > +     /*
> > +      * Make sure userspace isn't trying to trick us into dumping kernel
> > +      * memory by pointing the userspace instruction pointer at it.
> > +      */
> > +     bool bad_ip = user_mode(regs) &&
> > +                   __range_not_ok(prologue, OPCODE_BUFSIZE, TASK_SIZE_MAX);
> >
>
> Ok, can we pls move the sole dumping of opcodes in a helper called,
> __show_opcodes(), for example, which the checking wrapper show_opcodes()
> - without the "__" prefix - calls?
>
> So that show_signal_msg() can call the checking variant - show_opcodes()
> - as userspace might be doing monkey business there and we definitely
> wanna check first but __show_regs() can call the non-checking variant
> __show_opcodes() because there we wanna dump whatever rIP points to
> because we wanna know if the machine has gone off into the weeds etc,
> when staring at splats.
>
> Or am I missing a security aspect here?

See above.

I'm checking for user_mode(regs), so as long as CS has a kernel
segment loaded, my patch shouldn't change anything, no matter where
RIP points.