linux-kernel - RE: [PATCH v2] x86: Allow user accesses to the base of the guard page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4e2ed7a9cbf54eeabe9be7764141f0d2@AcuMS.aculab.com>
Date: Mon, 25 Nov 2024 16:48:59 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Linus Torvalds' <torvalds@...ux-foundation.org>
CC: Andrew Cooper <andrew.cooper3@...rix.com>, "bp@...en8.de" <bp@...en8.de>,
	Josh Poimboeuf <jpoimboe@...nel.org>, "x86@...nel.org" <x86@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Arnd Bergmann
	<arnd@...nel.org>, Mikel Rychliski <mikel@...elr.com>, Thomas Gleixner
	<tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Dave Hansen
	<dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>
Subject: RE: [PATCH v2] x86: Allow user accesses to the base of the guard page

From: Linus Torvalds
> Sent: 24 November 2024 22:39
> 
> On Sun, 24 Nov 2024 at 14:03, Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
> >
> > On Sun, 24 Nov 2024 at 12:49, David Laight <David.Laight@...lab.com> wrote:
> > >
> > > Doesn't that just need a <= changed to < ?
> > > (And possibly of name)
> >
> > Well, more importantly, it means that we can't use the same helper
> > function at all.
> 
> Oh, the 'sbb' trick also ends up being one byte off if we increase
> USER_PTR_MAX from being the "maximum valid address" to being "one past
> the max valid address".
> 
> And yeah, none of this *matters*, since "one byte off" is then covered
> by the fact that we have that guard page, but this just ends up
> annoying me sense of how it all *should* work.

Maybe we need to look at what this code is trying to achieve.
It seems to be trying to do two different things:
1) Stop real user accesses to kernel memory.
2) Stop speculative accesses to user-defined kernel addresses. 

But they get rather mixed together in this pattern:
+	if (can_do_masked_user_access())
+		from = masked_user_access_begin(from);
+	else if (!user_read_access_begin(from, sizeof(*from)))
+		return -EFAULT;
+	unsafe_get_user(val, from, Efault);
+	user_access_end();
+	*dest = val;
+	return 0;
+Efault:
+	user_access_end();
+	return -EFAULT;

where the masked address designed to stop speculative accesses is
used for a real access.

I was looking at the epoll code.
It does an early access_ok() and then two __put_user() calls to write a 32bit
and 64bit value (with an architecture dependant stride or 12 or 16 bytes).

With access_ok() (possibly just checking the base address if there is a
guard page) it doesn't matter which order the accesses are done in.
But masked_user_access_begin() is going to convert a kernel address to ~0,
this is fine for speculative accesses but for real accesses it becomes
imperative that the first address is to the base address.
(Otherwise the accesses will go to page 0 at the bottom of userspace
which (IIRC) it has to possible to map - so won't always fault.)

Relying on that seems dangerous.

So perhaps rename things a bit so the above starts:
	if (have_guard_page())
		from = bound_to_guard_page_user_access_begin(from);
The default bound_to_guard_page() would be min(addr, guard_page).
Architectures that have 'speculative read issues' would need
that min() to be done without a branch (eg with cmov) but
others may not care.

ppc, of course, needs the length (I don't know whether a guard page
would help or is needed).

After that the order of the accesses doesn't matter - provided they
don't have page sized jumps.

Then rename USER_PTR_MAX to USER_GUARD_PAGE_ADDRESS.
After all it doesn't matter at all where the unwanted speculative
reads end up - provided it isn't a user-defined kernel address.
The base of the guard page is as good as anywhere else.

> I'm starting to think that instead of changing the USER_PTR_MAX value
> (that was selected to be the right constant for things that don't care
> about the size), we just say "the slow case of access_ok() takes a
> tiny hit".
> 
> Kind of like Mikel Rychliski's patch, but we can certainly do better,
> ie something like
> 
>   --- a/arch/x86/include/asm/uaccess_64.h
>   +++ b/arch/x86/include/asm/uaccess_64.h
>   @@ -101,8 +101,9 @@ static inline bool __access_ok(..
>                 return valid_user_address(ptr);
>         } else {
>                 unsigned long sum = size + (__force unsigned long)ptr;
>   +             unsigned long max = runtime_const_ptr(USER_PTR_MAX)+1;
> 
>   -             return valid_user_address(sum) && sum >= (__force
> unsigned long)ptr;
>   +             return sum <= max && sum >= (__force unsigned long)ptr;
>         }
>    }
>    #define __access_ok __access_ok
> 
> would seem like it should work, and now doesn't make the fast-cases be
> off by one.

I don't think it really matters whether access_ok() returns 0 or
a later access faults.
Particularly in the corner case of an access to the base of the guard page.

> 
> Yes, it adds a "add 1" to the runtime path (instead of the init-time
> constant fixup), which is admittedly annoying. But this really
> *should* be the slow path.

There is no real reason why you can't have two constants that are
one apart.

> We do have a few annoying non-constant access_ok() calls in core code.
> The iter code uses access_ok + raw_copy_to_user, because it's evil and
> bad. I'm really not sure why it does that. I think it's *purely* bad
> history, ie we used to do access_ok() far away from the iov_iter copy,
> and then did __copy_from_user(), and then at some point it got changed
> to have the access_ok() closer, rather than just use
> "copy_from_user()".

Is there still an access_ok() in iovec_import (I've not got a tree handy).
All seemed too far away from any actual copy to me.
(IIRC there is a pretty pointless 'direction' flag as well?)

> So I get the feeling that those access_ok() cases could be removed
> entirely in favor of just using the right user copy function. Getting
> rid of the whole silly size checking thing.

That would be a plan.
Would be interesting to unravel what io_uring (or was it bpf) was
doing where it casts a 'long' to a user pointer just to validate it.

> David, does that patch above work for you?

You are the boss :-)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)