linux-kernel - Re: [PATCH v3 2/6] x86/uaccess: Avoid barrier_nospec() in 64-bit __get

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=whm4fEYrzrrRrqEhELLFz2xNCMT9be+J0uiR_EwXwa0DA@mail.gmail.com>
Date: Thu, 21 Nov 2024 17:02:06 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Josh Poimboeuf <jpoimboe@...nel.org>
Cc: David Laight <David.Laight@...lab.com>, "x86@...nel.org" <x86@...nel.org>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Thomas Gleixner <tglx@...utronix.de>, 
	Borislav Petkov <bp@...en8.de>, Peter Zijlstra <peterz@...radead.org>, 
	Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>, Waiman Long <longman@...hat.com>, 
	Dave Hansen <dave.hansen@...ux.intel.com>, Ingo Molnar <mingo@...hat.com>, 
	Michael Ellerman <mpe@...erman.id.au>, 
	"linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>, Andrew Cooper <andrew.cooper3@...rix.com>, 
	Mark Rutland <mark.rutland@....com>, "Kirill A . Shutemov" <kirill@...temov.name>
Subject: Re: [PATCH v3 2/6] x86/uaccess: Avoid barrier_nospec() in 64-bit __get_user()

On Thu, 21 Nov 2024 at 16:12, Josh Poimboeuf <jpoimboe@...nel.org> wrote:
>
> The asm looks good, but the C exploded a bit... why not just have an
> inline get_user()?

That was originally one of my goals for the "unsafe" ones - if done
right, they'd be the proper building blocks for a get_user(), and we'd
only really need one single good implementation.

But it really does blow up the code size quite noticeably. The old
sign-based thing wasn't quite so bad, and was one of the main reasons
I really didn't like having to switch to the big constant thing, but
with the constant, the "get_user()" part is basically 27 bytes per
location.

The uninlined call is 5 bytes.

(And both then have the error handling - the inlined one can use the
asm goto to *maybe* make up for some of it because it avoids tests,
but I suspect it ends up being pretty similar in the end).

So I'm not really sure inlining makes sense - except if you have code
that you've profiled.

Part of the problem is that you can't really just make an inline
function. You need to make one for each size. Which we've done, but it
gets real messy real quick. I don't want to have yet another "switch
(sizeof..)" thing.

Or you'd make it a macro (which makes dealing with different types
easy), but then it would have to be a statement expression to return
the error, and to take advantage of that exception handling error
handling gets messed up real quick too.

End result: the

        if (can_do_masked_user_access())
                from = masked_user_access_begin(from);
        else if (!user_read_access_begin(from, sizeof(*from)))
                goto enable_and_error;

        unsafe_get_user(val, from, Efault);
        user_access_end();

pattern is very good for generating fine code, but it's rather nasty
to encapsulate as one single macro somewhere. It really ends up having
those two error cases: the one that just returns the error, and the
one that has to do user_access_end().

[ Time passes ]

Ugh. I tried it. It looks like this:

#define inlined_get_user(res, ptr) ({                           \
        __label__ fail2, fail1;                                 \
        __auto_type __up = (ptr);                               \
        int __ret = 0;                                          \
        if (can_do_masked_user_access())                        \
                __up = masked_user_access_begin(__up);          \
        else if (!user_read_access_begin(__up, sizeof(*__up)))  \
                goto fail1;                                     \
        unsafe_get_user(res, ptr, fail2);                       \
        user_access_end();                                      \
        if (0) {                                                \
fail2:  user_access_end();                                      \
fail1:  __ret = -EFAULT;                                        \
        }                                                       \
        __ret; })

and maybe that works. It generated ok code in this case, where I did

  int futex_get_value_locked(u32 *dest, u32 __user *from)
  {
        int ret;

        pagefault_disable();
        ret = inlined_get_user(*dest, from);
        pagefault_enable();
        return ret;
  }

but I'm not convinced it's better than open-coding it. We have some
ugly macros in the kernel, but that may be one of the ugliest I've
ever written.

> > How would you speed it up?
>
> I'm not sure if you saw the example code snippet I posted up-thread,
> here it is below.

Oh, ok, no I hadn't seen that one.

Yeah, it looks like that would work. What a horrendous crock. But your
point that it would find the nasty __get_user() cases is nicely made.

             Linus