[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wjJt49tgtmYv42bXU3h0Txb+mQZEOHseahA4EcK6s=BxA@mail.gmail.com>
Date: Thu, 21 Nov 2024 19:57:48 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Josh Poimboeuf <jpoimboe@...nel.org>
Cc: David Laight <David.Laight@...lab.com>, "x86@...nel.org" <x86@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Thomas Gleixner <tglx@...utronix.de>,
Borislav Petkov <bp@...en8.de>, Peter Zijlstra <peterz@...radead.org>,
Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>, Waiman Long <longman@...hat.com>,
Dave Hansen <dave.hansen@...ux.intel.com>, Ingo Molnar <mingo@...hat.com>,
Michael Ellerman <mpe@...erman.id.au>,
"linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>, Andrew Cooper <andrew.cooper3@...rix.com>,
Mark Rutland <mark.rutland@....com>, "Kirill A . Shutemov" <kirill@...temov.name>
Subject: Re: [PATCH v3 2/6] x86/uaccess: Avoid barrier_nospec() in 64-bit __get_user()
On Thu, 21 Nov 2024 at 19:11, Josh Poimboeuf <jpoimboe@...nel.org> wrote:
>
> On Thu, Nov 21, 2024 at 05:02:06PM -0800, Linus Torvalds wrote:
> > [ Time passes ]
> >
> > Ugh. I tried it. It looks like this:
> >
> > #define inlined_get_user(res, ptr) ({ \
> > __label__ fail2, fail1; \
> > __auto_type __up = (ptr); \
> > int __ret = 0; \
> > if (can_do_masked_user_access()) \
> > __up = masked_user_access_begin(__up); \
> > else if (!user_read_access_begin(__up, sizeof(*__up))) \
> > goto fail1; \
> > unsafe_get_user(res, ptr, fail2); \
That 'ptr' needs to be '__up' of course.
Other than that it actually seems to work.
And by "seems to work" I mean "I replaced get_user() with this macro
instead, and my default kernel continued to build fine".
I didn't actually *test* the end result, although i did look at a few
more cases of code generation, and visually it looks sane enough.
> That actually doesn't seem so bad, it's easy enough to follow the logic.
I'm not loving the "if (0)" with the labels inside of it. But it's the
only way I can see to make a statement expression like this work,
since you can't have a "return" inside of it.
> We could easily use that macro in size-specific inline functions
> selected by a macro with a sizeof(type) switch statement -- not so bad
> IMO if they improve code usage and generation.
The point of it being a macro is that then it doesn't need the
size-specific games at all, because it "just works" since the types
end up percolating inside the logic.
Of course, that depends on unsafe_get_user() itself having the
size-specific games in it, so that's a bit of a lie, but at least it
needs the games in just one place.
And yes, having inline wrappers anyway could then allow for the
compiler making the "inline or not" choice, but the compiler really
doesn't end up having any idea of whether something is more important
from a performance angle, so that wouldn't actually be a real
improvement.
> Then all the arches have to do is implement unsafe_*_user_{1,2,4,8} and
> the "one good implementation" idea comes together?
Yeah, except honestly, basically *none* of them do.
The list or architectures that actually implement the unsafe accessors
is sad: it's x86, powerpc, and arm64. That's it.
Which is - along with my bloat worry - what makes me go "it's not
worth fighting".
Also, honestly, it is *very* rare that "get_user()" and "put_user()"
is performance-critical or even noticeable. We have lots of important
user copies, and the path and argument copy (aka
"strncpy_from_user()") is very important, but very few other places
tend to be.
So I see "copy_from_user()" in profiles, and I see
"strncpy_from_user()", and I've seen a few special cases that I've
converted (look at get_sigset_argpack(), for example - it shows up on
some select-heavy loads).
And now you found another one in that futex code, and that is indeed special.
But honestly, most get_user() cases are really in things like random ioctls etc.
Which is why I suspect we'd be better off just doing the important
ones one by one.
And doing the important ones individually may sound really nasty, but
if they are important, it does kind of make sense.
For example, one place I suspect *is* worth looking at is the execve()
argument handling. But to optimize that isn't about inlining
get_user(). It's about doing more than one word for each user access,
particularly with CLAC/STAC being as slow as they often are.
So what you actually would want to do is to combine these two
ret = -EFAULT;
str = get_user_arg_ptr(argv, argc);
if (IS_ERR(str))
goto out;
len = strnlen_user(str, MAX_ARG_STRLEN);
if (!len)
goto out;
into one thing, if you really cared enough. I've looked at it, and
always gone "I don't _quite_ care that much", even though argument
handling definitely shows up on some benchmarks (partly because it
mostly shows up on fairly artificial ones - the rest of execve is so
expensive that even when we waste some time on argument handling, it's
not noticeable enough to be worth spending huge amount of effort on).
But you could also look at the pure argv counting code (aka "count()"
in fs/exec.c). That *also* shows up quite a bit, and there batching
things migth be a much easier thing. But again, it's not about
inlining get_user(), it's about doing multiple accesses without
turning user accesses on and off all the time.
Anyway, that was a long way of saying: I really think we should just
special-case the (few) important cases that get reported. Because any
*big* improvements will come not from just inlining.
Linus
Powered by blists - more mailing lists