linux-kernel - Re: [RFC PATCH 1/2] x86: WARN() when uaccess helpers fault on kernel addresses

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAG48ez0c0L+WyfjDcfx=3wDwOQDOSJM-2JSVUBoWJCaKDVEfbw@mail.gmail.com>
Date:   Thu, 23 Aug 2018 01:53:23 +0200
From:   Jann Horn <jannh@...gle.com>
To:     Andy Lutomirski <luto@...capital.net>
Cc:     Kees Cook <keescook@...omium.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "H . Peter Anvin" <hpa@...or.com>,
        "the arch/x86 maintainers" <x86@...nel.org>,
        Kernel Hardening <kernel-hardening@...ts.openwall.com>,
        kernel list <linux-kernel@...r.kernel.org>,
        Andy Lutomirski <luto@...nel.org>,
        Dmitry Vyukov <dvyukov@...gle.com>
Subject: Re: [RFC PATCH 1/2] x86: WARN() when uaccess helpers fault on kernel addresses

On Tue, Aug 7, 2018 at 4:55 AM Andy Lutomirski <luto@...capital.net> wrote:
> > On Aug 6, 2018, at 6:22 PM, Jann Horn <jannh@...gle.com> wrote:
> > There have been multiple kernel vulnerabilities that permitted userspace to
> > pass completely unchecked pointers through to userspace accessors:
> >
> > - the waitid() bug - commit 96ca579a1ecc ("waitid(): Add missing
> >  access_ok() checks")
> > - the sg/bsg read/write APIs
> > - the infiniband read/write APIs
> >
> > These don't happen all that often, but when they do happen, it is hard to
> > test for them properly; and it is probably also hard to discover them with
> > fuzzing. Even when an unmapped kernel address is supplied to such buggy
> > code, it just returns -EFAULT instead of doing a proper BUG() or at least
> > WARN().
> >
> > This patch attempts to make such misbehaving code a bit more visible by
> > WARN()ing in the pagefault handler code when a userspace accessor causes
> > #PF on a kernel address and the current context isn't whitelisted.
>
> I like this a lot, and, in fact, I once wrote a patch to do something similar. It was before the fancy extable code, though, so it was a mess.  Here are some thoughts:
>
> - It should be three patches. One patch to add the _UA annotations, one to improve the info passes to the handlers, and one to change behavior.
>
> - You should pass the vector, the error code, and the address to the handler.

I'm polishing the patch a bit, and I've noticed that to plumb the
error code and address through properly, I might need significantly
more code churn because of kprobes - I want to make sure I'm not going
down the completely wrong path here.

I'm extending fixup_exception() to take two extra args "unsigned long
error_code, unsigned long fault_addr". Most callers of
fixup_exception() are fairly straightforward, but
kprobe_fault_handler() has a dozen callchains from different exception
handlers, and most of them are coming via notify_die(). (My RFC patch
cheated by just feeding zeroes into fixup_exception() from
kprobe_fault_handler().) Also, annoyingly, for !CONFIG_KPROBES,
kprobe_fault_handler() is defined in include/linux/kprobes.h with a
single prototype across architectures.

Currently, for example, when do_general_protection() handles a kernel
fault, it first directly calls into fixup_exception(); and then, if
this is happening inside a kprobe, via
notify_die()->atomic_notifier_call_chain()->kprobe_exceptions_notify()->kprobe_fault_handler()->fixup_exception(),
it can end up calling fixup handlers a second time.
I think there's also some inconsistency between #PF and #GP in the
ordering of error handling:

#PF handling:
__do_page_fault
  kprobes_fault
    kprobe_fault_handler
      ->fault_handler    # first: kprobe fault handler
      fixup_exception    # second: kprobe's fixup call
  bad_area_nosemaphore
    __bad_area_nosemaphore
      no_context
        fixup_exception    # third: normal fixup call

#GP handling:
do_general_protection
  fixup_exception    # first: normal fixup call
  notify_die
    atomic_notifier_call_chain
      kprobe_exceptions_notify
        kprobe_fault_handler
          ->fault_handler    # second: kprobe fault handler
          fixup_exception    # third: kprobe's fixup call

Do you think I should actually plumb the error code and fault address
all the way through notify_die() and the kprobe handling fault?
Should I supply some "I don't want to trigger uaccess fault handlers"
flag when coming from the kprobe code?
Should the kprobe code not call into fixup_exception() at all (and if
so, should I change that somehow)?