lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e2b4c397-629b-f2d2-38d0-15eaf481c792@zytor.com>
Date:   Wed, 12 May 2021 10:50:22 -0700
From:   "H. Peter Anvin" <hpa@...or.com>
To:     Ingo Molnar <mingo@...nel.org>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Borislav Petkov <bp@...en8.de>,
        Andy Lutomirski <luto@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [RFC v2 PATCH 7/7] x86/entry: use int for syscall number; handle
 all invalid syscall nrs

On 5/12/21 1:51 AM, Ingo Molnar wrote:
> 
> I've applied patches 1-6, thanks Peter!
> 
> Wrt. patch #7 - the changelog is hedging things a bit and the changes are
> non-trivial. Does this patch (intend to) change any actual observable
> behavior in the system call interface, and if yes, in which areas?
> 
> Or is this a pure cleanup with no observable changes expected?
> 

I'll clean up the comments a bit [including per tglx' review.] I'm 
writing this email in part to organize my own thoughts in how to better 
explain the motivation for the patch, and the extent of visible differences.

Right now, *some* code will treat e.g. 0x0000000100000001 as a system 
call and some will not. Some of the code, notably in ptrace, will treat 
0x000000018000000 as a system call and some will not. Finally, right 
now, e.g. 335 for x86-64 will force the exit code to be set to -ENOSYS 
even if poked by ptrace, but 548 will not, because there is an 
observable difference between an out of range system call and a system 
call number that falls outside the range of the table.

The use of a non-system-call number in a system call can come in in a 
few ways:

    1. Via ptrace;
    2. From seccomp;
    3. By explicitly passing %eax = -1 to a system call.

#3 isn't really a problem *unless* it for some reason confuses ptrace or 
seccomp users -- we could do an early-out for it.

For ptrace and seccomp, we enter with the return value (regs->ax) set to 
-ENOSYS, regardless of if the system call number is valid or not. 
ptrace/seccomp has the option of independently emulate a system call, 
then set regs->orig_ax to -1 and provide any chosen return value in 
regs->ax. In that case, we must *not* update regs->ax to -ENOSYS before 
returning.

The arch-independent code all assumes that a system call is "int" that 
the value -1 specifically and not just any negative value is used for a 
non-system call. This is the case on x86 as well when arch-independent 
code is involved. The arch-independent API is defined/documented (but 
not *implemented*!) in <asm-generic/syscall.h>, and what this patch is 
intended to do is to make the x86-specific code follow.

As everyone either does or should know, treating the same data in 
different ways in different flows is a security hole just waiting to 
happen.

Most of these are relatively recently introduced bugs in x86-64; the 
original assembly code version zero-extended %rax, compared it against 
the length of the system call table, and would unconditionally return 
-ENOSYS otherwise. Then *at least* on two separate occasions someone 
"optimized" it by removing "movl %eax, %eax" not understanding that this 
is not a noop in x86-64 but a zero-extend (perhaps gas ought to be able 
to allow movzlq as an alias...) introducing a critical security bug 
which was one of the motivators for the SMAP CPU feature.

On x86-64, the ABI is that the callee is responsible for extending 
parameters, so only examining the lower 32 bits is fully consistent with 
any "int" argument to any system call, e.g. regs->di for write(2).

Andy L. and I had a fairly long discussion about this, and some of this 
was updated after the first RFC, but I fully agree I'm not capturing it 
all that well. I hope the above points clear things up better and I'll 
rewrite this into a better patch description.

	-hpa

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ