linux-kernel - Re: [PATCH] x86: only use ERMS for user copies for larger sizes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181122103231.GA102790@gmail.com>
Date:   Thu, 22 Nov 2018 11:32:31 +0100
From:   Ingo Molnar <mingo@...nel.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     pabeni@...hat.com, Jens Axboe <axboe@...nel.dk>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, bp@...en8.de,
        Peter Anvin <hpa@...or.com>,
        the arch/x86 maintainers <x86@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Andrew Lutomirski <luto@...nel.org>,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>, dvlasenk@...hat.com,
        brgerst@...il.com,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] x86: only use ERMS for user copies for larger sizes


* Linus Torvalds <torvalds@...ux-foundation.org> wrote:

> On Wed, Nov 21, 2018 at 10:16 AM Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
> >
> > It might be interesting to just change raw_copy_to/from_user() to
> > handle a lot more cases (in particular, handle cases where 'size' is
> > 8-byte aligned). The special cases we *do* have may not be the right
> > ones (the 10-byte case in particular looks odd).
> >
> > For example, instead of having a "if constant size is 8 bytes, do one
> > get/put_user()" case, we might have a "if constant size is < 64 just
> > unroll it into get/put_user()" calls.
> 
> Actually, x86 doesn't even set INLINE_COPY_TO_USER, so I don't think
> the constant size cases ever trigger at all the way they are set up
> now.

Side note, there's one artifact the patch exposes: some of the 
__builtin_constant_p() checks are imprecise and don't trigger at the 
early stage where GCC checks them, but the lenght is actually known to 
the compiler at later optimization stages.

This means that with Jen's patch some of the length checks go away. I 
checked x86-64 defconfig and a distro config, and the numbers were ~7% 
and 10%, so not a big effect.

The kernel text size reduction with Jen's patch is small but real:

 text		data		bss		dec		hex	filename
 19572694	11516934	19873888	50963516	309a43c	vmlinux.before
 19572468	11516934	19873888	50963290	309a35a	vmlinux.after

But I checked the disassembly, and it's not a real win, the new code is 
actually more complex than the old one, as expected, but GCC (7.3.0) does 
some particularly stupid things which bloats the generated code.

> I do have a random patch that makes "unsafe_put_user()" actually use
> "asm goto" for the error case, and that, together with the attached
> patch seems to generate fairly nice code, but even then it would
> depend on gcc actually unrolling things (which we do *not* want in
> general).
> 
> But for a 32-byte user copy (cp_old_stat), and that
> INLINE_COPY_TO_USER, it generates this:
> 
>         stac
>         movl    $32, %edx       #, size
>         movq    %rsp, %rax      #, src
> .L201:
>         movq    (%rax), %rcx    # MEM[base: src_155, offset: 0B],
> MEM[base: src_155, offset: 0B]
> 1:      movq %rcx,0(%rbp)       # MEM[base: src_155, offset: 0B],
> MEM[(struct __large_struct *)dst_156]
> ASM_EXTABLE_HANDLE from=1b to=.L200 handler="ex_handler_uaccess"        #
> 
>         addq    $8, %rax        #, src
>         addq    $8, %rbp        #, statbuf
>         subq    $8, %rdx        #, size
>         jne     .L201   #,
>         clac
> 
> which is actually fairly close to "optimal".

Yeah, that looks pretty sweet!

> Random patch (with my "asm goto" hack included) attached, in case
> people want to play with it.

Doesn't even look all that hacky to me. Any hack in it that I didn't 
notice? :-)

The only question is the inlining overhead - will try to measure that.

> Impressively, it actually removes more lines of code than it adds. But
> I didn't actually check whether the end result *works*, so hey..

Most of the linecount reduction appears to come from the simplification 
of the unroll loop and moving it into C, from a 6-way hard-coded copy 
routine:

> -	switch (size) {
> -	case 1:
> -	case 2:
> -	case 4:
> -	case 8:
> -	case 10:
> -	case 16:

to a more flexible 4-way loop unrolling:

> +	while (size >= sizeof(unsigned long)) {
> +	while (size >= sizeof(unsigned int)) {
> +	while (size >= sizeof(unsigned short)) {
> +	while (size >= sizeof(unsigned char)) {

Which is a nice improvement in itself.

> +	user_access_begin();
> +	if (dirent)
> +		unsafe_put_user(offset, &dirent->d_off, efault_end);
>  	dirent = buf->current_dir;
> +	unsafe_put_user(d_ino, &dirent->d_ino, efault_end);
> +	unsafe_put_user(reclen, &dirent->d_reclen, efault_end);
> +	unsafe_put_user(0, dirent->d_name + namlen, efault_end);
> +	unsafe_put_user(d_type, (char __user *) dirent + reclen - 1, efault_end);
> +	user_access_end();
> +
>  	if (copy_to_user(dirent->d_name, name, namlen))
>  		goto efault;
>  	buf->previous = dirent;
>  	dirent = (void __user *)dirent + reclen;
>  	buf->current_dir = dirent;
>  	buf->count -= reclen;
>  	return 0;
> +efault_end:
> +	user_access_end();
>  efault:
>  	buf->error = -EFAULT;
>  	return -EFAULT;

In terms of high level APIs, could we perhaps use the opportunity to 
introduce unsafe_write_user() instead, which would allow us to write it 
as:

	unsafe_write_user(&dirent->d_ino, d_ino, efault_end);
	unsafe_write_user(&dirent->d_reclen, reclen, efault_end);
	unsafe_write_user(dirent->d_name + namlen, 0, efault_end);
	unsafe_write_user((char __user *)dirent + reclen - 1, d_type, efault_end);

	if (copy_to_user(dirent->d_name, name, namlen))
		goto efault;

This gives it the regular 'VAR = VAL;' notation of C assigments, instead 
of the weird historical reverse notation that put_user()/get_user() uses.

Note how this newfangled ordering now matches the 'copy_to_user()' 
natural C-assignment parameter order that comes straight afterwards and 
makes it obvious that the d->name+namelen was writing the delimiter at 
the end.

I think we even had bugs from put_user() ordering mixups?

Or is it too late to try to fix this particular mistake?

Thanks,

	Ingo