[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4e2ed7a9cbf54eeabe9be7764141f0d2@AcuMS.aculab.com>
Date: Mon, 25 Nov 2024 16:48:59 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Linus Torvalds' <torvalds@...ux-foundation.org>
CC: Andrew Cooper <andrew.cooper3@...rix.com>, "bp@...en8.de" <bp@...en8.de>,
Josh Poimboeuf <jpoimboe@...nel.org>, "x86@...nel.org" <x86@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Arnd Bergmann
<arnd@...nel.org>, Mikel Rychliski <mikel@...elr.com>, Thomas Gleixner
<tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Dave Hansen
<dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>
Subject: RE: [PATCH v2] x86: Allow user accesses to the base of the guard page
From: Linus Torvalds
> Sent: 24 November 2024 22:39
>
> On Sun, 24 Nov 2024 at 14:03, Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
> >
> > On Sun, 24 Nov 2024 at 12:49, David Laight <David.Laight@...lab.com> wrote:
> > >
> > > Doesn't that just need a <= changed to < ?
> > > (And possibly of name)
> >
> > Well, more importantly, it means that we can't use the same helper
> > function at all.
>
> Oh, the 'sbb' trick also ends up being one byte off if we increase
> USER_PTR_MAX from being the "maximum valid address" to being "one past
> the max valid address".
>
> And yeah, none of this *matters*, since "one byte off" is then covered
> by the fact that we have that guard page, but this just ends up
> annoying me sense of how it all *should* work.
Maybe we need to look at what this code is trying to achieve.
It seems to be trying to do two different things:
1) Stop real user accesses to kernel memory.
2) Stop speculative accesses to user-defined kernel addresses.
But they get rather mixed together in this pattern:
+ if (can_do_masked_user_access())
+ from = masked_user_access_begin(from);
+ else if (!user_read_access_begin(from, sizeof(*from)))
+ return -EFAULT;
+ unsafe_get_user(val, from, Efault);
+ user_access_end();
+ *dest = val;
+ return 0;
+Efault:
+ user_access_end();
+ return -EFAULT;
where the masked address designed to stop speculative accesses is
used for a real access.
I was looking at the epoll code.
It does an early access_ok() and then two __put_user() calls to write a 32bit
and 64bit value (with an architecture dependant stride or 12 or 16 bytes).
With access_ok() (possibly just checking the base address if there is a
guard page) it doesn't matter which order the accesses are done in.
But masked_user_access_begin() is going to convert a kernel address to ~0,
this is fine for speculative accesses but for real accesses it becomes
imperative that the first address is to the base address.
(Otherwise the accesses will go to page 0 at the bottom of userspace
which (IIRC) it has to possible to map - so won't always fault.)
Relying on that seems dangerous.
So perhaps rename things a bit so the above starts:
if (have_guard_page())
from = bound_to_guard_page_user_access_begin(from);
The default bound_to_guard_page() would be min(addr, guard_page).
Architectures that have 'speculative read issues' would need
that min() to be done without a branch (eg with cmov) but
others may not care.
ppc, of course, needs the length (I don't know whether a guard page
would help or is needed).
After that the order of the accesses doesn't matter - provided they
don't have page sized jumps.
Then rename USER_PTR_MAX to USER_GUARD_PAGE_ADDRESS.
After all it doesn't matter at all where the unwanted speculative
reads end up - provided it isn't a user-defined kernel address.
The base of the guard page is as good as anywhere else.
> I'm starting to think that instead of changing the USER_PTR_MAX value
> (that was selected to be the right constant for things that don't care
> about the size), we just say "the slow case of access_ok() takes a
> tiny hit".
>
> Kind of like Mikel Rychliski's patch, but we can certainly do better,
> ie something like
>
> --- a/arch/x86/include/asm/uaccess_64.h
> +++ b/arch/x86/include/asm/uaccess_64.h
> @@ -101,8 +101,9 @@ static inline bool __access_ok(..
> return valid_user_address(ptr);
> } else {
> unsigned long sum = size + (__force unsigned long)ptr;
> + unsigned long max = runtime_const_ptr(USER_PTR_MAX)+1;
>
> - return valid_user_address(sum) && sum >= (__force
> unsigned long)ptr;
> + return sum <= max && sum >= (__force unsigned long)ptr;
> }
> }
> #define __access_ok __access_ok
>
> would seem like it should work, and now doesn't make the fast-cases be
> off by one.
I don't think it really matters whether access_ok() returns 0 or
a later access faults.
Particularly in the corner case of an access to the base of the guard page.
>
> Yes, it adds a "add 1" to the runtime path (instead of the init-time
> constant fixup), which is admittedly annoying. But this really
> *should* be the slow path.
There is no real reason why you can't have two constants that are
one apart.
> We do have a few annoying non-constant access_ok() calls in core code.
> The iter code uses access_ok + raw_copy_to_user, because it's evil and
> bad. I'm really not sure why it does that. I think it's *purely* bad
> history, ie we used to do access_ok() far away from the iov_iter copy,
> and then did __copy_from_user(), and then at some point it got changed
> to have the access_ok() closer, rather than just use
> "copy_from_user()".
Is there still an access_ok() in iovec_import (I've not got a tree handy).
All seemed too far away from any actual copy to me.
(IIRC there is a pretty pointless 'direction' flag as well?)
> So I get the feeling that those access_ok() cases could be removed
> entirely in favor of just using the right user copy function. Getting
> rid of the whole silly size checking thing.
That would be a plan.
Would be interesting to unravel what io_uring (or was it bpf) was
doing where it casts a 'long' to a user pointer just to validate it.
> David, does that patch above work for you?
You are the boss :-)
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists