linux-kernel - Re: [kernel-hardening] Re: [PATCH v9 1/4] syscalls: Verify address limit before returning to user-mode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGXu5jL9vUrn4kpjO+qa4cHmWBypeqP17OGbrMs=5Nz0YpQMZw@mail.gmail.com>
Date:   Fri, 12 May 2017 12:01:59 -0700
From:   Kees Cook <keescook@...omium.org>
To:     Martin Schwidefsky <schwidefsky@...ibm.com>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Thomas Garnier <thgarnie@...gle.com>, Greg KH <greg@...ah.com>,
        Ingo Molnar <mingo@...nel.org>,
        Daniel Micay <danielmicay@...il.com>,
        Heiko Carstens <heiko.carstens@...ibm.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Arnd Bergmann <arnd@...db.de>,
        Thomas Gleixner <tglx@...utronix.de>,
        David Howells <dhowells@...hat.com>,
        René Nyffenegger <mail@...enyffenegger.ch>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Paul E . McKenney" <paulmck@...ux.vnet.ibm.com>,
        "Eric W . Biederman" <ebiederm@...ssion.com>,
        Oleg Nesterov <oleg@...hat.com>,
        Pavel Tikhomirov <ptikhomirov@...tuozzo.com>,
        Ingo Molnar <mingo@...hat.com>,
        "H . Peter Anvin" <hpa@...or.com>,
        Andy Lutomirski <luto@...nel.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Rik van Riel <riel@...hat.com>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Borislav Petkov <bp@...en8.de>,
        Brian Gerst <brgerst@...il.com>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Christian Borntraeger <borntraeger@...ibm.com>,
        Russell King <linux@...linux.org.uk>,
        Will Deacon <will.deacon@....com>,
        Catalin Marinas <catalin.marinas@....com>,
        Mark Rutland <mark.rutland@....com>,
        James Morse <james.morse@....com>,
        linux-s390 <linux-s390@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux API <linux-api@...r.kernel.org>,
        "the arch/x86 maintainers" <x86@...nel.org>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        Kernel Hardening <kernel-hardening@...ts.openwall.com>,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>,
        Al Viro <viro@...iv.linux.org.uk>
Subject: Re: [kernel-hardening] Re: [PATCH v9 1/4] syscalls: Verify address
 limit before returning to user-mode

On Thu, May 11, 2017 at 10:54 PM, Martin Schwidefsky
<schwidefsky@...ibm.com> wrote:
> On Thu, 11 May 2017 22:34:31 -0700
> Kees Cook <keescook@...omium.org> wrote:
>
>> On Thu, May 11, 2017 at 10:28 PM, Martin Schwidefsky
>> <schwidefsky@...ibm.com> wrote:
>> > On Thu, 11 May 2017 16:44:07 -0700
>> > Linus Torvalds <torvalds@...ux-foundation.org> wrote:
>> >
>> >> On Thu, May 11, 2017 at 4:17 PM, Thomas Garnier <thgarnie@...gle.com> wrote:
>> >> >
>> >> > Ingo: Do you want the change as-is? Would you like it to be optional?
>> >> > What do you think?
>> >>
>> >> I'm not ingo, but I don't like that patch. It's in the wrong place -
>> >> that system call return code is too timing-critical to add address
>> >> limit checks.
>> >>
>> >> Now what I think you *could* do is:
>> >>
>> >>  - make "set_fs()" actually set a work flag in the current thread flags
>> >>
>> >>  - do the test in the slow-path (syscall_return_slowpath).
>> >>
>> >> Yes, yes, that ends up being architecture-specific, but it's fairly simple.
>> >>
>> >> And it only slows down the system calls that actually use "set_fs()".
>> >> Sure, it will slow those down a fair amount, but they are hopefully a
>> >> small subset of all cases.
>> >>
>> >> How does that sound to people?  Thats' where we currently do that
>> >>
>> >>         if (IS_ENABLED(CONFIG_PROVE_LOCKING) &&
>> >>             WARN(irqs_disabled(), "syscall %ld left IRQs disabled",
>> >> regs->orig_ax))
>> >>                 local_irq_enable();
>> >>
>> >> check too, which is a fairly similar issue.
>> >
>> > This is exactly what Heiko did for the s390 backend as a result of this
>> > discussion. See the _CIF_ASCE_SECONDARY bit in arch/s390/kernel/entry.S,
>> > for the hot patch the check for the bit is included in the general
>> > _CIF_WORK test. Only the slow patch gets a bit slower.
>> >
>> > git commit b5a882fcf146c87cb6b67c6df353e1c042b8773d
>> > "s390: restore address space when returning to user space".
>>
>> If I'm understanding this, it won't catch corruption of addr_limit
>> during fast-path syscalls, though (i.e. addr_limit changed without a
>> call to set_fs()). :( This addr_limit corruption is mostly only a risk
>> archs without THREAD_INFO_IN_TASK, but it would still be nice to catch
>> unbalanced set_fs() code, so I like the idea. I like getting rid of
>> addr_limit entirely even more, but that'll take some time. :)
>
> Well for s390 there is no addr_limit as we use two separate address space
> for kernel vs. user. The equivalent to the addr_limit corruption on a
> fast-path syscall would be changing CR7 outside of set_fs. This boils
> down to the question what we are protection against? Bad code with
> unbalanced set_fs or evil code that changes addr_limit/CR7 outside of
> set_fs

Yeah, the risk for "corrupted addr_limit" is mainly a concern for
archs with addr_limit on the kernel stack. If I'm reading things
correctly, that means, from the archs I've been paying closer
attention to, it's an issue for arm, mips, and powerpc:

arch/arm/include/asm/uaccess.h: current_thread_info()->addr_limit = fs;
arch/arm/include/asm/thread_info.h:             (current_stack_pointer
& ~(THREAD_SIZE - 1));

arch/mips/include/asm/uaccess.h:#define set_fs(x)
(current_thread_info()->addr_limit = (x))
arch/mips/kernel/process.c:        * task stacks at THREAD_SIZE - 32

arch/powerpc/include/asm/uaccess.h:#define set_fs(val)
(current->thread.fs = (val))
arch/powerpc/kernel/process.c:          struct pt_regs *regs =
task_stack_page(current) + THREAD_SIZE;

(s390 uses a register, x86 and arm64 implement THREAD_INFO_IN_TASK.)
Targeting addr_limit through arbitrary write attacks isn't too common
since ... it's an arbitrary write. The issue with addr_limit was that
it can live on the kernel stack, which meant all kinds of
stack-related bugs can lead to it getting stomped on.

So, two goals to protect addr_limit:

- get it off the stack to make the difficulty of corruption on par
with other sensitive things that would require an arbitrary write
flaw.

- detect/block unbalanced set_fs() calls.

If we can get the former addressed by the remaining architectures,
then that class of attack will go away. For the latter, it sounds like
Linus's slowpath-exit will work nicely.

To me it looks like he architectures with addr_limit still on the
stack would still benefit from always-check-addr_limit on syscall
exit, but that would be arch-specific anyway.

And then, of course, we've got the parallel task of just removing
set_fs() entirely. :)

-Kees

-- 
Kees Cook
Pixel Security