[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241115230653.hfvzyf3aqqntgp63@jpoimboe>
Date: Fri, 15 Nov 2024 15:06:53 -0800
From: 'Josh Poimboeuf' <jpoimboe@...nel.org>
To: David Laight <David.Laight@...LAB.COM>
Cc: "x86@...nel.org" <x86@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Borislav Petkov <bp@...en8.de>,
Peter Zijlstra <peterz@...radead.org>,
Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
Waiman Long <longman@...hat.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Ingo Molnar <mingo@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Michael Ellerman <mpe@...erman.id.au>,
"linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>,
Andrew Cooper <andrew.cooper3@...rix.com>,
Mark Rutland <mark.rutland@....com>,
"Kirill A . Shutemov" <kirill@...temov.name>
Subject: Re: [PATCH v3 2/6] x86/uaccess: Avoid barrier_nospec() in 64-bit
__get_user()
On Fri, Nov 08, 2024 at 05:12:53PM +0000, David Laight wrote:
> From: Josh Poimboeuf
> > On Mon, Oct 28, 2024 at 06:56:15PM -0700, Josh Poimboeuf wrote:
> > > The barrier_nospec() in 64-bit __get_user() is slow. Instead use
> > > pointer masking to force the user pointer to all 1's if a previous
> > > access_ok() mispredicted true for an invalid address.
> >
> > Linus pointed out that __get_user() may be used by some code to access
> > both kernel and user space and in fact I found one such usage in
> > vc_read_mem()....
.. which sucks because I got a "will-it-scale.per_process_ops 1.9%
improvement" report for this patch.
It's sad that __get_user() is now slower than get_user() on x86, it kind
of defeats the whole point!
I know at least the "coco" code is misusing __get_user(). Unless
somebody wants to audit all the other callers, we could do something
horrific:
.macro __get_user_nocheck_nospec
#ifdef CONFIG_X86_64
movq $0x0123456789abcdef, %rdx
1:
.pushsection runtime_ptr_USER_PTR_MAX, "a"
.long 1b - 8 - .
.popsection
cmp %rax, %rdx
jb 10f
sbb %rdx, %rdx
or %rdx, %rax
jmp 11f
10: /*
* Stop access_ok() branch misprediction -- both of them ;-)
*
* As a benefit this also punishes callers who intentionally call this
* with a kernel address. Once they're rooted out, __get_user() can
* just become an alias of get_user().
*
* TODO: Add WARN_ON()
*/
#endif
ASM_BARRIER_NOSPEC
11:
.endm
/* .. and the same for __get_user, just without the range checks */
SYM_FUNC_START(__get_user_nocheck_1)
__get_user_nocheck_nospec
ASM_STAC
UACCESS movzbl (%_ASM_AX),%edx
xor %eax,%eax
ASM_CLAC
RET
SYM_FUNC_END(__get_user_nocheck_1)
EXPORT_SYMBOL(__get_user_nocheck_1)
Yes, I know adding another access_ok() is bad, but it would be a
definite speedup. And adding a WARN_ON() would root out any other bad
callers pretty quick.
> But I've wondered if access_ok() ought to be implemented using an
> 'asm goto with output' - much like get_user().
>
> Then the use would be:
> masked_address = access_ok(maybe_bad_address, size, jump_label);
> with later user accesses using the masked_address.
>
> Once you've done that __get_user() doesn't need to contain address masking.
Sure, we just need a volunteer to change all the access_ok() implementations
and callers tree-wide ;-)
> Given that clac/stac iare so slow should there are be something that
> combines stac with access_ok() bracketed with a 'user_access_end'
> or an actual fault.
>
> I've sure there is code (maybe reading iovec[] or in sys_poll())
> that wants to do multiple get/put_user in a short loop rather that
> calling copy_to/from_user().
We already have this with user_access_begin() + unsafe_get_user().
There's also a version which masks the address: masked_user_access_begin().
We just need to start porting things over.
--
Josh
Powered by blists - more mailing lists