[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wigHm2J4LkUL1=y_H8zGwM0JsK2CrWyLNbz9fvXfbaBQA@mail.gmail.com>
Date: Thu, 21 Nov 2024 14:16:12 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Josh Poimboeuf <jpoimboe@...nel.org>
Cc: David Laight <David.Laight@...lab.com>, "x86@...nel.org" <x86@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Thomas Gleixner <tglx@...utronix.de>,
Borislav Petkov <bp@...en8.de>, Peter Zijlstra <peterz@...radead.org>,
Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>, Waiman Long <longman@...hat.com>,
Dave Hansen <dave.hansen@...ux.intel.com>, Ingo Molnar <mingo@...hat.com>,
Michael Ellerman <mpe@...erman.id.au>,
"linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>, Andrew Cooper <andrew.cooper3@...rix.com>,
Mark Rutland <mark.rutland@....com>, "Kirill A . Shutemov" <kirill@...temov.name>
Subject: Re: [PATCH v3 2/6] x86/uaccess: Avoid barrier_nospec() in 64-bit __get_user()
On Thu, 21 Nov 2024 at 13:40, Josh Poimboeuf <jpoimboe@...nel.org> wrote:
>
> The profile is showing futex_get_value_locked():
Ahh.
> That has several callers, so we can probably just use get_user() there?
Yeah, that's the simplest thing. That thing isn't even some inline
function, so the real cost is the call.
That said, exactly because it's not inlined, and calls are expensive,
and this is apparently really critical, we can just do it with the
full "unsafe_get_user()" model.
It's not so complicated. The attached patch is untested, but I did
check that it generates almost perfect code:
mov %gs:0x0,%rax # current
incl 0x1a9c(%rax) # current->pagefault_disable++
movabs $0x123456789abcdef,%rcx # magic virtual address size
cmp %rsi,%rcx # address masking
sbb %rcx,%rcx
or %rsi,%rcx
stac # enable user space acccess
mov (%rcx),%ecx # get the value
clac # disable user space access
decl 0x1a9c(%rax) # current->pagefault_disable--
mov %ecx,(%rdi) # save the value
xor %eax,%eax # return 0
ret
(with the error case for the page fault all out-of-line).
So this should be _faster_ than the old __get_user(), because while
the address masking is not needed, it's cheaper than the function call
used to be and the error handling is better.
If you can test this and verify that it actually help, I'll take it as
a patch. Consider it signed-off after testing.
> Also, is there any harm in speeding up __get_user()? It still has ~80
> callers and it's likely to be slowing down things we don't know about.
How would you speed it up? We definitely can't replace the fence with
addressing tricks. So we can't just replace it with "get_user()",
because of those horrid architecture-specific kernel uses.
Now, we could possibly say "just remove the fence in __get_user()
entirely", but that would involve moving it to access_ok().
And then it wouldn't actually speed anything up (except the horrid
architecture-specific kernel uses that then don't call access_ok() at
all - and we don't care about *those*).
Linus
View attachment "patch.diff" of type "text/x-patch" (1128 bytes)
Powered by blists - more mailing lists