linux-kernel - Re: [kernel-hardening] Re: [PATCH v9 1/4] syscalls: Verify address limit before returning to user-mode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAGXu5jL-qvFxLkJZSosAovK4qL5eLPOD7orpei42x6mK_tBXhw@mail.gmail.com>
Date:   Tue, 9 May 2017 09:30:02 -0700
From:   Kees Cook <keescook@...omium.org>
To:     Ingo Molnar <mingo@...nel.org>
Cc:     Daniel Micay <danielmicay@...il.com>,
        Thomas Garnier <thgarnie@...gle.com>,
        Martin Schwidefsky <schwidefsky@...ibm.com>,
        Heiko Carstens <heiko.carstens@...ibm.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Arnd Bergmann <arnd@...db.de>,
        Thomas Gleixner <tglx@...utronix.de>,
        David Howells <dhowells@...hat.com>,
        René Nyffenegger <mail@...enyffenegger.ch>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Paul E . McKenney" <paulmck@...ux.vnet.ibm.com>,
        "Eric W . Biederman" <ebiederm@...ssion.com>,
        Oleg Nesterov <oleg@...hat.com>,
        Pavel Tikhomirov <ptikhomirov@...tuozzo.com>,
        Ingo Molnar <mingo@...hat.com>,
        "H . Peter Anvin" <hpa@...or.com>,
        Andy Lutomirski <luto@...nel.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Rik van Riel <riel@...hat.com>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Borislav Petkov <bp@...en8.de>,
        Brian Gerst <brgerst@...il.com>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Christian Borntraeger <borntraeger@...ibm.com>,
        Russell King <linux@...linux.org.uk>,
        Will Deacon <will.deacon@....com>,
        Catalin Marinas <catalin.marinas@....com>,
        Mark Rutland <mark.rutland@....com>,
        James Morse <james.morse@....com>,
        linux-s390 <linux-s390@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux API <linux-api@...r.kernel.org>,
        "the arch/x86 maintainers" <x86@...nel.org>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        Kernel Hardening <kernel-hardening@...ts.openwall.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>,
        Al Viro <viro@...iv.linux.org.uk>
Subject: Re: [kernel-hardening] Re: [PATCH v9 1/4] syscalls: Verify address
 limit before returning to user-mode

On Mon, May 8, 2017 at 11:56 PM, Ingo Molnar <mingo@...nel.org> wrote:
>
> * Kees Cook <keescook@...omium.org> wrote:
>
>> > There's the option of using GCC plugins now that the infrastructure was
>> > upstreamed from grsecurity. It can be used as part of the regular build
>> > process and as long as the analysis is pretty simple it shouldn't hurt compile
>> > time much.
>>
>> Well, and that the situation may arise due to memory corruption, not from
>> poorly-matched set_fs() calls, which static analysis won't help solve. We need
>> to catch this bad kernel state because it is a very bad state to run in.

[attempting some thread-merging]

> Ok, so that's CVE-2010-4258, where an oops with KERNEL_DS set was used to escalate
> privileges, due to the kernel's oops handler not cleaning up the KERNEL_DS. The
> exploit used another bug, a crash in a network protocol handler, to execute the
> oops handler with KERNEL_DS set.

Right, I didn't mean to suggest that vulnerability would be fixed by
this solution. I was trying to show how there can be some pretty
complex interaction with exceptions/interrupts/etc that would make
pure static analysis still miss things.

> If memory corruption corrupted the task state into having addr_limit set to
> KERNEL_DS then there's already a fair chance that it's game over: it could also
> have set *uid to 0, or changed a sensitive PF_ flag, or a number of other
> things...
>
> Furthermore, think about it: there's literally an infinite amount of corrupted
> task states that could be a security problem and that could be checked after every
> system call. Do we want to check every one of them?

Right, but this "slippery slope" argument isn't the best way to reject
security changes. Let me take a step back and describe the threat, and
where we should likely spend time:

The primary threat with addr_limit getting changed is that a
narrowly-scoped attack (traditionally stack exhaustion or
adjacent-stack large-index writes) could be leveraged into opening the
entire kernel to writes (by allowing all syscalls with a
copy_to_user() call to suddenly be able to write to kernel memory).
So, really, the flaw is having addr_limit at all. Removing set_fs()
should, I think, allow this to become a const (or at least should get
us a lot closer).

The main path to corrupting addr_limit has been via stack corruption.
On architectures with CONFIG_THREAD_INFO_IN_TASK, this risk is greatly
reduced already, but it's not universally available yet. (And as long
as we're talking about stack attacks, CONFIG_VMAP_STACK makes
cross-stack overflows go away, and cross-stack indexing harder, but
that's not really about addr_limit since currently nothing with
VMAP_STACK doesn't already have THREAD_INFO_IN_TASK.)

So, left with a still exploitable target in memory that allows such an
expansion of attack method, I still think it's worth keeping this
patch series, but if we can drop set_fs() I could probably be
convinced the benefit of the series doesn't exceed the cost on
THREAD_INFO_IN_TASK-architectures (x86, arm64, s390). But that means
at least currently keeping it on arm, for example. If we can make
addr_limit const, well, we don't need the series at all.

-Kees

-- 
Kees Cook
Pixel Security