lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CALCETrW_Uj22bpFD09_ESXi-6Xs2v=YRq+rPeExRa+Uqqte8Aw@mail.gmail.com>
Date:   Fri, 29 Jun 2018 09:07:36 -0700
From:   Andy Lutomirski <luto@...capital.net>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Andrew Lutomirski <luto@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        linux-api <linux-api@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        Boqun Feng <boqun.feng@...il.com>,
        Dave Watson <davejwatson@...com>, Paul Turner <pjt@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Russell King <linux@....linux.org.uk>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>, Andi Kleen <andi@...stfloor.org>,
        Chris Lameter <cl@...ux.com>, Ben Maurer <bmaurer@...com>,
        rostedt <rostedt@...dmis.org>,
        Josh Triplett <josh@...htriplett.org>,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will.deacon@....com>,
        Michael Kerrisk <mtk.manpages@...il.com>,
        Joel Fernandes <joelaf@...gle.com>
Subject: Re: [RFC PATCH for 4.18 1/2] rseq: validate rseq_cs fields are < TASK_SIZE

On Fri, Jun 29, 2018 at 8:27 AM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
>
> On Fri, Jun 29, 2018, 08:03 Mathieu Desnoyers
> <mathieu.desnoyers@...icios.com> wrote:
>>
>>
>> Considering those inconsistencies between architectures (either
>> the task gets killed, or the top bits are silently cleared), I'm
>> very much tempted to be restrictive in the inputs accepted by
>> rseq, and not rely on architectures as providing consistent
>> validation of the return IP.
>>
>> Thoughts ?
>
>
> Then you need to make it a compat system call, since clearly you and Andy
> want the 32-bit case to do something different from the 64-bit case.

I personally would like the compat and non-compat cases to do exactly
the same thing.  If abort_ip is the address (as a u64) of a valid
executable instruction and an abort happens, then that instruction
should get executed.  If abort_ip does not point to user-executable
memory, then the process should get a signal.

The problem isn't with rseq per se -- it's with the daft way that the
x86 return-to-userspace instructions work.  If I apply this patch:

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 3b2490b81918..26e4ba44e87b 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -346,6 +346,8 @@ __visible void do_int80_syscall_32(struct pt_regs *regs)
 {
        enter_from_user_mode();
        local_irq_enable();
+       if (!user_64bit_mode(regs))
+               regs->ip |= (1UL << 32);
        do_syscall_32_irqs_on(regs);
 }

the kernel *still works*.  But this unconditionally uses the IRET
path, and I don't even want to speculate as to what the hell happens
if we exit using SYSRETL, or LRET, or SYSEXITL, or if Intel or AMD
ever gives us a new mode that gets rid of the espfix shite.  IOW, I
don't think that the x86 entry code should make a promise that it will
continue ignoring the high bits of regs->ip when
!user_64bit_mode(regs) on a 64-bit kernel.

The problem with rseq as it stands is that this oddity gets
accidentally exposed all the way to userspace.  If we're not careful,
it'll be possible for a slightly buggy user program to goof up the
code that generates the data structure that supplies abort_ip such
that the high bits are garbage (0xcccccccc due to padding, for
example), and the kernel will do exactly what the user code requested,
and we'll get regs->ip = 0xcccccccc00000000 | (the actual intended
abort_ip), and we'll pass that crap value all the way to IRET, and
IRET will truncate it back down to the correct value.  And then some
CPU will add new behavior or we'll invoke SYSRETL or whatever on some
weird CPU, and the program will crash.  And we'll be sad.

I suppose we could handle this in the entry code by coming up with a
way to reject out-of-bounds regs->ip for 32-bit tasks, but that's
going to be a bit messy and will slow down normal code that doesn't
use rseq.

Other than rseq, I don't think that there's any real issue.  The only
ways to get regs->ip >= 2^32 in a 32-bit task involve ptrace or manual
fiddling with signal contexts, and I don't expect to ever have any
real software depend on precisely what happens.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ