[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0049a144-be4d-46ca-acaf-cfe37ff06e6e@csgroup.eu>
Date: Tue, 4 Nov 2025 07:20:42 +0100
From: Christophe Leroy <christophe.leroy@...roup.eu>
To: Thomas Gleixner <tglx@...utronix.de>, LKML <linux-kernel@...r.kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Andrew Cooper <andrew.cooper3@...rix.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
David Laight <david.laight.linux@...il.com>,
kernel test robot <lkp@...el.com>, Russell King <linux@...linux.org.uk>,
linux-arm-kernel@...ts.infradead.org, x86@...nel.org,
Madhavan Srinivasan <maddy@...ux.ibm.com>,
Michael Ellerman <mpe@...erman.id.au>, Nicholas Piggin <npiggin@...il.com>,
linuxppc-dev@...ts.ozlabs.org, Paul Walmsley <pjw@...nel.org>,
Palmer Dabbelt <palmer@...belt.com>, linux-riscv@...ts.infradead.org,
Heiko Carstens <hca@...ux.ibm.com>,
Christian Borntraeger <borntraeger@...ux.ibm.com>,
Sven Schnelle <svens@...ux.ibm.com>, linux-s390@...r.kernel.org,
Julia Lawall <Julia.Lawall@...ia.fr>, Nicolas Palix <nicolas.palix@...g.fr>,
Peter Zijlstra <peterz@...radead.org>, Darren Hart <dvhart@...radead.org>,
Davidlohr Bueso <dave@...olabs.net>, André Almeida
<andrealmeid@...lia.com>, Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
linux-fsdevel@...r.kernel.org
Subject: Re: [patch V5 07/12] uaccess: Provide scoped user access regions
Le 27/10/2025 à 09:43, Thomas Gleixner a écrit :
> User space access regions are tedious and require similar code patterns all
> over the place:
>
> if (!user_read_access_begin(from, sizeof(*from)))
> return -EFAULT;
> unsafe_get_user(val, from, Efault);
> user_read_access_end();
> return 0;
> Efault:
> user_read_access_end();
> return -EFAULT;
>
> This got worse with the recent addition of masked user access, which
> optimizes the speculation prevention:
>
> if (can_do_masked_user_access())
> from = masked_user_read_access_begin((from));
> else if (!user_read_access_begin(from, sizeof(*from)))
> return -EFAULT;
> unsafe_get_user(val, from, Efault);
> user_read_access_end();
> return 0;
> Efault:
> user_read_access_end();
> return -EFAULT;
>
> There have been issues with using the wrong user_*_access_end() variant in
> the error path and other typical Copy&Pasta problems, e.g. using the wrong
> fault label in the user accessor which ends up using the wrong accesss end
> variant.
>
> These patterns beg for scopes with automatic cleanup. The resulting outcome
> is:
> scoped_user_read_access(from, Efault)
> unsafe_get_user(val, from, Efault);
> return 0;
> Efault:
> return -EFAULT;
>
> The scope guarantees the proper cleanup for the access mode is invoked both
> in the success and the failure (fault) path.
>
> The scoped_user_$MODE_access() macros are implemented as self terminating
> nested for() loops. Thanks to Andrew Cooper for pointing me at them. The
> scope can therefore be left with 'break', 'goto' and 'return'. Even
> 'continue' "works" due to the self termination mechanism. Both GCC and
> clang optimize all the convoluted macro maze out and the above results with
> clang in:
>
> b80: f3 0f 1e fa endbr64
> b84: 48 b8 ef cd ab 89 67 45 23 01 movabs $0x123456789abcdef,%rax
> b8e: 48 39 c7 cmp %rax,%rdi
> b91: 48 0f 47 f8 cmova %rax,%rdi
> b95: 90 nop
> b96: 90 nop
> b97: 90 nop
> b98: 31 c9 xor %ecx,%ecx
> b9a: 8b 07 mov (%rdi),%eax
> b9c: 89 06 mov %eax,(%rsi)
> b9e: 85 c9 test %ecx,%ecx
> ba0: 0f 94 c0 sete %al
> ba3: 90 nop
> ba4: 90 nop
> ba5: 90 nop
> ba6: c3 ret
>
> Which looks as compact as it gets. The NOPs are placeholder for STAC/CLAC.
> GCC emits the fault path seperately:
>
> bf0: f3 0f 1e fa endbr64
> bf4: 48 b8 ef cd ab 89 67 45 23 01 movabs $0x123456789abcdef,%rax
> bfe: 48 39 c7 cmp %rax,%rdi
> c01: 48 0f 47 f8 cmova %rax,%rdi
> c05: 90 nop
> c06: 90 nop
> c07: 90 nop
> c08: 31 d2 xor %edx,%edx
> c0a: 8b 07 mov (%rdi),%eax
> c0c: 89 06 mov %eax,(%rsi)
> c0e: 85 d2 test %edx,%edx
> c10: 75 09 jne c1b <afoo+0x2b>
> c12: 90 nop
> c13: 90 nop
> c14: 90 nop
> c15: b8 01 00 00 00 mov $0x1,%eax
> c1a: c3 ret
> c1b: 90 nop
> c1c: 90 nop
> c1d: 90 nop
> c1e: 31 c0 xor %eax,%eax
> c20: c3 ret
>
>
> The fault labels for the scoped*() macros and the fault labels for the
> actual user space accessors can be shared and must be placed outside of the
> scope.
>
> If masked user access is enabled on an architecture, then the pointer
> handed in to scoped_user_$MODE_access() can be modified to point to a
> guaranteed faulting user address. This modification is only scope local as
> the pointer is aliased inside the scope. When the scope is left the alias
> is not longer in effect. IOW the original pointer value is preserved so it
> can be used e.g. for fixup or diagnostic purposes in the fault path.
>
> Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
> Cc: Christophe Leroy <christophe.leroy@...roup.eu>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
> Cc: Andrew Cooper <andrew.cooper3@...rix.com>
> Cc: Linus Torvalds <torvalds@...ux-foundation.org>
> Cc: David Laight <david.laight.linux@...il.com>
> ---
> V4: Remove the _masked_ naming as it's actually confusing - David
> Remove underscores and make _tmpptr void - David
> Add comment about access size and range - David
> Shorten local variables and remove a few unneeded brackets - Mathieu
> V3: Make it a nested for() loop
> Get rid of the code in macro parameters - Linus
> Provide sized variants - Mathieu
> V2: Remove the shady wrappers around the opening and use scopes with automatic cleanup
> ---
> include/linux/uaccess.h | 192 ++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 192 insertions(+)
>
> --- a/include/linux/uaccess.h
> +++ b/include/linux/uaccess.h
> @@ -2,6 +2,7 @@
> #ifndef __LINUX_UACCESS_H__
> #define __LINUX_UACCESS_H__
>
> +#include <linux/cleanup.h>
> #include <linux/fault-inject-usercopy.h>
> #include <linux/instrumented.h>
> #include <linux/minmax.h>
> @@ -35,9 +36,17 @@
>
> #ifdef masked_user_access_begin
> #define can_do_masked_user_access() 1
> +# ifndef masked_user_write_access_begin
> +# define masked_user_write_access_begin masked_user_access_begin
> +# endif
> +# ifndef masked_user_read_access_begin
> +# define masked_user_read_access_begin masked_user_access_begin
> +#endif
You should move this out of the #ifdef/#else and remove the #else part
below, it should work as masked_user_access_begin(src) is defined in
both cases.
> #else
> #define can_do_masked_user_access() 0
> #define masked_user_access_begin(src) NULL
> + #define masked_user_read_access_begin(src) NULL
> + #define masked_user_write_access_begin(src) NULL
> #define mask_user_address(src) (src)
> #endif
>
> @@ -633,6 +642,189 @@ static inline void user_access_restore(u
> #define user_read_access_end user_access_end
> #endif
>
> +/* Define RW variant so the below _mode macro expansion works */
> +#define masked_user_rw_access_begin(u) masked_user_access_begin(u)
> +#define user_rw_access_begin(u, s) user_access_begin(u, s)
> +#define user_rw_access_end() user_access_end()
> +
> +/* Scoped user access */
> +#define USER_ACCESS_GUARD(_mode) \
> +static __always_inline void __user * \
> +class_user_##_mode##_begin(void __user *ptr) \
> +{ \
> + return ptr; \
> +} \
> + \
> +static __always_inline void \
> +class_user_##_mode##_end(void __user *ptr) \
> +{ \
> + user_##_mode##_access_end(); \
> +} \
> + \
> +DEFINE_CLASS(user_ ##_mode## _access, void __user *, \
> + class_user_##_mode##_end(_T), \
> + class_user_##_mode##_begin(ptr), void __user *ptr) \
> + \
> +static __always_inline class_user_##_mode##_access_t \
> +class_user_##_mode##_access_ptr(void __user *scope) \
> +{ \
> + return scope; \
> +}
> +
> +USER_ACCESS_GUARD(read)
> +USER_ACCESS_GUARD(write)
> +USER_ACCESS_GUARD(rw)
> +#undef USER_ACCESS_GUARD
> +
> +/**
> + * __scoped_user_access_begin - Start a scoped user access
> + * @mode: The mode of the access class (read, write, rw)
> + * @uptr: The pointer to access user space memory
> + * @size: Size of the access
> + * @elbl: Error label to goto when the access region is rejected
> + *
> + * Internal helper for __scoped_user_access(). Don't use directly
> + */
> +#define __scoped_user_access_begin(mode, uptr, size, elbl) \
> +({ \
> + typeof(uptr) __retptr; \
> + \
> + if (can_do_masked_user_access()) { \
> + __retptr = masked_user_##mode##_access_begin(uptr); \
> + } else { \
> + __retptr = uptr; \
> + if (!user_##mode##_access_begin(uptr, size)) \
> + goto elbl; \
> + } \
> + __retptr; \
> +})
> +
> +/**
> + * __scoped_user_access - Open a scope for user access
> + * @mode: The mode of the access class (read, write, rw)
> + * @uptr: The pointer to access user space memory
> + * @size: Size of the access
> + * @elbl: Error label to goto when the access region is rejected. It
> + * must be placed outside the scope
> + *
> + * If the user access function inside the scope requires a fault label, it
> + * can use @elvl or a different label outside the scope, which requires
> + * that user access which is implemented with ASM GOTO has been properly
> + * wrapped. See unsafe_get_user() for reference.
> + *
> + * scoped_user_rw_access(ptr, efault) {
> + * unsafe_get_user(rval, &ptr->rval, efault);
> + * unsafe_put_user(wval, &ptr->wval, efault);
> + * }
> + * return 0;
> + * efault:
> + * return -EFAULT;
> + *
> + * The scope is internally implemented as a autoterminating nested for()
> + * loop, which can be left with 'return', 'break' and 'goto' at any
> + * point.
> + *
> + * When the scope is left user_##@...de##_access_end() is automatically
> + * invoked.
> + *
> + * When the architecture supports masked user access and the access region
> + * which is determined by @uptr and @size is not a valid user space
> + * address, i.e. < TASK_SIZE, the scope sets the pointer to a faulting user
> + * space address and does not terminate early. This optimizes for the good
> + * case and lets the performance uncritical bad case go through the fault.
> + *
> + * The eventual modification of the pointer is limited to the scope.
> + * Outside of the scope the original pointer value is unmodified, so that
> + * the original pointer value is available for diagnostic purposes in an
> + * out of scope fault path.
> + *
> + * Nesting scoped user access into a user access scope is invalid and fails
> + * the build. Nesting into other guards, e.g. pagefault is safe.
> + *
> + * The masked variant does not check the size of the access and relies on a
> + * mapping hole (e.g. guard page) to catch an out of range pointer, the
> + * first access to user memory inside the scope has to be within
> + * @uptr ... @uptr + PAGE_SIZE - 1
> + *
> + * Don't use directly. Use scoped_masked_user_$MODE_access() instead.
> + */
> +#define __scoped_user_access(mode, uptr, size, elbl) \
> +for (bool done = false; !done; done = true) \
> + for (void __user *_tmpptr = __scoped_user_access_begin(mode, uptr, size, elbl); \
> + !done; done = true) \
> + for (CLASS(user_##mode##_access, scope)(_tmpptr); !done; done = true) \
> + /* Force modified pointer usage within the scope */ \
> + for (const typeof(uptr) uptr = _tmpptr; !done; done = true)
> +
> +/**
> + * scoped_user_read_access_size - Start a scoped user read access with given size
> + * @usrc: Pointer to the user space address to read from
> + * @size: Size of the access starting from @usrc
> + * @elbl: Error label to goto when the access region is rejected
> + *
> + * For further information see __scoped_user_access() above.
> + */
> +#define scoped_user_read_access_size(usrc, size, elbl) \
> + __scoped_user_access(read, usrc, size, elbl)
> +
> +/**
> + * scoped_user_read_access - Start a scoped user read access
> + * @usrc: Pointer to the user space address to read from
> + * @elbl: Error label to goto when the access region is rejected
> + *
> + * The size of the access starting from @usrc is determined via sizeof(*@...c)).
> + *
> + * For further information see __scoped_user_access() above.
> + */
> +#define scoped_user_read_access(usrc, elbl) \
> + scoped_user_read_access_size(usrc, sizeof(*(usrc)), elbl)
> +
> +/**
> + * scoped_user_write_access_size - Start a scoped user write access with given size
> + * @udst: Pointer to the user space address to write to
> + * @size: Size of the access starting from @udst
> + * @elbl: Error label to goto when the access region is rejected
> + *
> + * For further information see __scoped_user_access() above.
> + */
> +#define scoped_user_write_access_size(udst, size, elbl) \
> + __scoped_user_access(write, udst, size, elbl)
> +
> +/**
> + * scoped_user_write_access - Start a scoped user write access
> + * @udst: Pointer to the user space address to write to
> + * @elbl: Error label to goto when the access region is rejected
> + *
> + * The size of the access starting from @udst is determined via sizeof(*@...t)).
> + *
> + * For further information see __scoped_user_access() above.
> + */
> +#define scoped_user_write_access(udst, elbl) \
> + scoped_user_write_access_size(udst, sizeof(*(udst)), elbl)
> +
> +/**
> + * scoped_user_rw_access_size - Start a scoped user read/write access with given size
> + * @uptr Pointer to the user space address to read from and write to
> + * @size: Size of the access starting from @uptr
> + * @elbl: Error label to goto when the access region is rejected
> + *
> + * For further information see __scoped_user_access() above.
> + */
> +#define scoped_user_rw_access_size(uptr, size, elbl) \
> + __scoped_user_access(rw, uptr, size, elbl)
> +
> +/**
> + * scoped_user_rw_access - Start a scoped user read/write access
> + * @uptr Pointer to the user space address to read from and write to
> + * @elbl: Error label to goto when the access region is rejected
> + *
> + * The size of the access starting from @uptr is determined via sizeof(*@...r)).
> + *
> + * For further information see __scoped_user_access() above.
> + */
> +#define scoped_user_rw_access(uptr, elbl) \
> + scoped_user_rw_access_size(uptr, sizeof(*(uptr)), elbl)
> +
> #ifdef CONFIG_HARDENED_USERCOPY
> void __noreturn usercopy_abort(const char *name, const char *detail,
> bool to_user, unsigned long offset,
>
Powered by blists - more mailing lists