lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5fc3d97e-2736-468a-8038-8ae411385590@arm.com>
Date: Fri, 19 Dec 2025 17:19:06 +0000
From: Robin Murphy <robin.murphy@....com>
To: Ben Niu <benniu@...a.com>, Catalin Marinas <catalin.marinas@....com>
Cc: Will Deacon <will@...nel.org>,
 Kristina Martsenko <kristina.martsenko@....com>,
 linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] Faster Arm64 __arch_copy_from_user and
 __arch_copy_to_user

On 18/10/2025 6:22 am, Ben Niu wrote:
> Summary:
> 
> This patch adapted arch/arm64/lib/memcpy.S to __arch_copy_from_user
> and __arch_copy_to_user and the new implementations seemed faster
> than the current implementations.
> 
> For __arch_copy_from_user, two new versions are provided: one for
> when PAN is enabled, in which case memcpy.S's ldp/ldr are replaced
> with ldtr; the other for when PAN is disabled. In both cases, proper
> fault handling code is added to handle exceptions when reading from
> unmapped user-mode pages.
> 
> Similarly, __arch_copy_to_user's PAN version has memcpy.S's stp/str
> replaced with sttr and also fault handling code is added.
> 
> In addition, a new test case usercopy_test_init is added in
> lib/tests/usercopy_kunit.c to exhaustively test all possible cases
> in which the new implementations could fault against user-mode
> pages.
> 
> Test Plan:
> 
> For functionality, I booted private kernels with the new impls w/wo
> PAN and usercopy_kunit. All the tests passed. Also, I tested the new
> functions using Arm's memcpy test (string/test/memcpy.c) on
> https://github.com/ARM-software/optimized-routines.
> 
> For performance, I used Arm's memcpy benchmark (string/bench/memcpy.c)
> on https://github.com/ARM-software/optimized-routines. See below for
> results on Neoverse V2:
> 
> Baseline:
> Random memcpy (bytes/ns):
>     __arch_copy_to_user 32K: 12.10 64K: 11.98 128K: 12.11 256K: 11.50 512K: 10.88 1024K:  8.60 avg 11.03
>   __arch_copy_from_user 32K: 12.04 64K: 11.82 128K: 12.11 256K: 11.94 512K: 10.99 1024K:  8.81 avg 11.15
> 
> Medium memcpy aligned (bytes/ns):
>     __arch_copy_to_user 8B:  5.25 16B:  7.52 32B: 15.05 64B: 26.27 128B: 46.56 256B: 50.71 512B: 51.66
>   __arch_copy_from_user 8B:  5.26 16B:  7.53 32B: 15.04 64B: 33.32 128B: 46.57 256B: 51.89 512B: 52.10
> 
> Medium memcpy unaligned (bytes/ns):
>     __arch_copy_to_user 8B:  5.27 16B:  6.57 32B: 11.63 64B: 23.28 128B: 27.17 256B: 40.80 512B: 46.52
>   __arch_copy_from_user 8B:  5.27 16B:  6.55 32B: 11.64 64B: 23.28 128B: 34.95 256B: 43.54 512B: 47.02
> 
> Large memcpy (bytes/ns):
>     __arch_copy_to_user 1K: 51.70 2K: 52.36 4K: 52.12 8K: 52.36 16K: 51.87 32K: 52.07 64K: 51.01
>   __arch_copy_from_user 1K: 52.43 2K: 52.35 4K: 52.34 8K: 52.27 16K: 51.86 32K: 52.14 64K: 52.17
> 
> New (with PAN):
> Random memcpy (bytes/ns):
>     __arch_copy_to_user 32K: 20.81 64K: 20.22 128K: 19.63 256K: 18.89 512K: 12.84 1024K:  9.83 avg 15.74
>   __arch_copy_from_user 32K: 23.28 64K: 22.21 128K: 21.49 256K: 21.07 512K: 14.60 1024K: 10.82 avg 17.52
> 
> Medium memcpy aligned (bytes/ns):
>     __arch_copy_to_user 8B:  7.53 16B: 17.57 32B: 21.11 64B: 26.91 128B: 46.80 256B: 46.33 512B: 49.32
>   __arch_copy_from_user 8B:  7.53 16B: 17.53 32B: 30.21 64B: 31.24 128B: 52.03 256B: 49.61 512B: 51.11
> 
> Medium memcpy unaligned (bytes/ns):
>     __arch_copy_to_user 8B:  7.53 16B: 13.16 32B: 26.30 64B: 24.06 128B: 30.10 256B: 30.15 512B: 30.38
>   __arch_copy_from_user 8B:  7.53 16B: 17.58 32B: 35.12 64B: 26.36 128B: 38.66 256B: 45.64 512B: 47.18
> 
> Large memcpy (bytes/ns):
>     __arch_copy_to_user 1K: 50.90 2K: 51.85 4K: 51.86 8K: 52.32 16K: 52.44 32K: 52.53 64K: 52.51
>   __arch_copy_from_user 1K: 51.92 2K: 52.32 4K: 52.47 8K: 52.27 16K: 52.51 32K: 52.62 64K: 52.57
> 
> New (without PAN):
> NoPAN
> Random memcpy (bytes/ns):
>     __arch_copy_to_user 32K: 23.20 64K: 22.02 128K: 21.06 256K: 19.34 512K: 17.46 1024K: 11.76 avg 18.18
>   __arch_copy_from_user 32K: 24.44 64K: 23.41 128K: 22.53 256K: 21.23 512K: 17.84 1024K: 11.71 avg 18.97
> 
> Medium memcpy aligned (bytes/ns):
>     __arch_copy_to_user 8B:  7.56 16B: 17.64 32B: 33.65 64B: 33.10 128B: 57.97 256B: 70.43 512B: 75.89
>   __arch_copy_from_user 8B:  7.57 16B: 17.67 32B: 32.89 64B: 31.40 128B: 52.93 256B: 71.36 512B: 75.97
> 
> Medium memcpy unaligned (bytes/ns):
>     __arch_copy_to_user 8B:  7.57 16B: 17.65 32B: 35.29 64B: 31.01 128B: 38.93 256B: 44.58 512B: 46.24
>   __arch_copy_from_user 8B:  7.57 16B: 17.67 32B: 35.23 64B: 29.51 128B: 40.30 256B: 44.57 512B: 46.26
> 
> Large memcpy (bytes/ns):
>     __arch_copy_to_user 1K: 77.33 2K: 77.89 4K: 78.19 8K: 76.36 16K: 77.39 32K: 77.94 64K: 77.72
>   __arch_copy_from_user 1K: 77.40 2K: 77.94 4K: 78.28 8K: 76.56 16K: 77.56 32K: 77.92 64K: 77.69
> 
> As can be seen, the new verions are faster than the baseline in almost all tests. The only slower
> cases 256B and 512B unaligned copies with PAN, and I hope the reviewers from Arm and the community
> could offer some suggestions on how to mitigate them.

In fairness it's quite hard *not* to be at least somewhat faster than 
the current code, but beware that big cores like Neoverse V are going to 
be the most forgiving, and the results could be quite different for 
something like Cortex-A53, and even those older CPUs are still very 
widely used so they do matter.

Certainly I found that a straight transliteration to ldrt/strt pairs in 
our memcpy routine is less than optimal across a range of 
microarchitectures, and it's possible to do a fair bit better still for 
many cases. Furthermore, microbenchmarks are one thing, but even a 100% 
improvement here may only equate to 0.1% on an overall workload (based 
on squinting at some numbers from our CI that would need orders of 
magnitude more runs to reach any statistical significance), so there 
would really need to be an overwhelmingly strong justification for 
having a separate !CONFIG_PAN version to chase a relatively small 
fraction more at the potential cost of weakening security/robustness (or 
at the very least, the ongoing cost of having to *reason* about security 
more than we currently have to with the architectural guarantees of 
unprivileged accesses).

As for the patch itself, from a quick look I have to say I find the mass 
of numeric labels even harder to follow that the monstrosity I came up 
with 2 years ago, and while it looks like you don't quite have the same 
copy_to_user bug I had (missing the adjustment of src in the copy_long 
retry case), I think what you've done instead won't work for big-endian.

Thanks,
Robin.

> ---
> v2:
>    - Added linux-arm-kernel@...ts.infradead.org and linux-kernel@...r.kernel.org
>      to recipient lists for public submission
>    - No code changes
> ---
>   arch/arm64/lib/copy_from_user.S | 434 ++++++++++++++++++++--
>   arch/arm64/lib/copy_template.S  | 191 ----------
>   arch/arm64/lib/copy_to_user.S   | 615 ++++++++++++++++++++++++++++++--
>   lib/tests/usercopy_kunit.c      | 303 ++++++++++++----
>   4 files changed, 1199 insertions(+), 344 deletions(-)
>   delete mode 100644 arch/arm64/lib/copy_template.S
> 
> diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S
> index 400057d607ec..a4e8dbd10336 100644
> --- a/arch/arm64/lib/copy_from_user.S
> +++ b/arch/arm64/lib/copy_from_user.S
> @@ -20,38 +20,6 @@
>    *	x0 - bytes not copied
>    */
>   
> -	.macro ldrb1 reg, ptr, val
> -	user_ldst 9998f, ldtrb, \reg, \ptr, \val
> -	.endm
> -
> -	.macro strb1 reg, ptr, val
> -	strb \reg, [\ptr], \val
> -	.endm
> -
> -	.macro ldrh1 reg, ptr, val
> -	user_ldst 9997f, ldtrh, \reg, \ptr, \val
> -	.endm
> -
> -	.macro strh1 reg, ptr, val
> -	strh \reg, [\ptr], \val
> -	.endm
> -
> -	.macro ldr1 reg, ptr, val
> -	user_ldst 9997f, ldtr, \reg, \ptr, \val
> -	.endm
> -
> -	.macro str1 reg, ptr, val
> -	str \reg, [\ptr], \val
> -	.endm
> -
> -	.macro ldp1 reg1, reg2, ptr, val
> -	user_ldp 9997f, \reg1, \reg2, \ptr, \val
> -	.endm
> -
> -	.macro stp1 reg1, reg2, ptr, val
> -	stp \reg1, \reg2, [\ptr], \val
> -	.endm
> -
>   	.macro cpy1 dst, src, count
>   	.arch_extension mops
>   	USER_CPY(9997f, 0, cpyfprt [\dst]!, [\src]!, \count!)
> @@ -59,13 +27,45 @@
>   	USER_CPY(9996f, 0, cpyfert [\dst]!, [\src]!, \count!)
>   	.endm
>   
> -end	.req	x5
> -srcin	.req	x15
> +dstin	.req	x0
> +src	.req	x1
> +count	.req	x2
> +dst	.req	x3
> +srcend	.req	x4
> +dstend	.req	x5
> +srcin	.req	x6
> +A_l	.req	x6
> +A_lw	.req	w6
> +A_h	.req	x7
> +B_l	.req	x8
> +B_lw	.req	w8
> +B_h	.req	x9
> +C_l	.req	x10
> +C_lw	.req	w10
> +C_h	.req	x11
> +D_l	.req	x12
> +D_h	.req	x13
> +E_l	.req	x14
> +E_h	.req	x15
> +F_l	.req	x16
> +F_h	.req	x17
> +G_l	.req	count
> +G_h	.req	dst
> +H_l	.req	src
> +H_h	.req	srcend
> +tmp1	.req	x14
> +tmp2	.req	x15
> +
>   SYM_FUNC_START(__arch_copy_from_user)
> -	add	end, x0, x2
> +#ifdef CONFIG_AS_HAS_MOPS
> +alternative_if_not ARM64_HAS_MOPS
> +	b	.Lno_mops
> +alternative_else_nop_endif
> +	add	dstend, x0, x2
>   	mov	srcin, x1
> -#include "copy_template.S"
> -	mov	x0, #0				// Nothing to copy
> +	mov	dst, dstin
> +	cpy1	dst, src, count
> +	mov	x0, #0				// Nothing left to copy
>   	ret
>   
>   	// Exception fixups
> @@ -79,5 +79,365 @@ USER(9998f, ldtrb tmp1w, [srcin])
>   	strb	tmp1w, [dst], #1
>   9998:	sub	x0, end, dst			// bytes not copied
>   	ret
> +
> +.Lno_mops:
> +#endif
> +
> +#ifdef CONFIG_ARM64_PAN
> +	add	srcend, src, count
> +	add	dstend, dstin, count
> +	cmp	count, 128
> +	b.hi	.Lcopy_long
> +	cmp	count, 32
> +	b.hi	.Lcopy32_128
> +
> +	/* Small copies: 0..32 bytes.  */
> +	cmp	count, 16
> +	b.lo	.Lcopy16
> +	USER(9000f, ldtr A_l, [src])
> +	USER(9000f, ldtr A_h, [src, 8])
> +	USER(9000f, ldtr D_l, [srcend, -16])
> +	USER(9000f, ldtr D_h, [srcend, -8])
> +	stp	A_l, A_h, [dstin]
> +	stp	D_l, D_h, [dstend, -16]
> +	mov	x0, #0
> +	ret
> +
> +	/* Copy 8-15 bytes.  */
> +.Lcopy16:
> +	tbz	count, 3, .Lcopy8
> +	USER(9000f, ldtr	A_l, [src])
> +	USER(9000f, ldtr	A_h, [srcend, -8])
> +	str	A_l, [dstin]
> +	str	A_h, [dstend, -8]
> +	mov	x0, #0
> +	ret
> +
> +	.p2align 3
> +	/* Copy 4-7 bytes.  */
> +.Lcopy8:
> +	tbz	count, 2, .Lcopy4
> +	USER(9000f, ldtr	A_lw, [src])
> +	USER(9000f, ldtr	B_lw, [srcend, -4])
> +	str	A_lw, [dstin]
> +	str	B_lw, [dstend, -4]
> +	mov	x0, #0
> +	ret
> +
> +	/* Copy 0..3 bytes using a branchless sequence.  */
> +.Lcopy4:
> +	cbz	count, .Lcopy0
> +	lsr	tmp1, count, 1
> +	add tmp2, src, count, lsr 1
> +	USER(9000f, ldtrb	A_lw, [src])
> +	USER(9000f, ldtrb	B_lw, [tmp2])
> +	USER(9000f, ldtrb	C_lw, [srcend, -1])
> +	strb	A_lw, [dstin]
> +	strb	B_lw, [dstin, tmp1]
> +	strb	C_lw, [dstend, -1]
> +.Lcopy0:
> +	mov	x0, #0
> +	ret
> +
> +	.p2align 4
> +	/* Medium copies: 33..128 bytes.  */
> +.Lcopy32_128:
> +	USER(9000f, ldtr A_l, [src])
> +	USER(9000f, ldtr A_h, [src, 8])
> +	USER(9000f, ldtr B_l, [src, 16])
> +	USER(9000f, ldtr B_h, [src, 24])
> +	stp	A_l, A_h, [dstin]
> +	stp	B_l, B_h, [dstin, 16]
> +	cmp	count, 64
> +	b.hi	.Lcopy128
> +	USER(9001f, ldtr C_l, [srcend, -32])
> +	USER(9001f, ldtr C_h, [srcend, -24])
> +	USER(9001f, ldtr D_l, [srcend, -16])
> +	USER(9001f, ldtr D_h, [srcend, -8])
> +	stp	C_l, C_h, [dstend, -32]
> +	stp	D_l, D_h, [dstend, -16]
> +	mov	x0, #0
> +	ret
> +
> +	.p2align 4
> +	/* Copy 65..128 bytes.  */
> +.Lcopy128:
> +	USER(9001f, ldtr E_l, [src, 32])
> +	USER(9001f, ldtr E_h, [src, 40])
> +	USER(9001f, ldtr F_l, [src, 48])
> +	USER(9001f, ldtr F_h, [src, 56])
> +	stp	E_l, E_h, [dstin, 32]
> +	stp	F_l, F_h, [dstin, 48]
> +	cmp	count, 96
> +	b.ls	.Lcopy96
> +	USER(9002f, ldtr C_l, [srcend, -64])
> +	USER(9002f, ldtr C_h, [srcend, -56])
> +	USER(9002f, ldtr D_l, [srcend, -48])
> +	USER(9002f, ldtr D_h, [srcend, -40])
> +	stp	C_l, C_h, [dstend, -64]
> +	stp	D_l, D_h, [dstend, -48]
> +.Lcopy96:
> +	USER(9002f, ldtr G_l, [srcend, -32])
> +	USER(9002f, ldtr G_h, [srcend, -24])
> +	USER(9002f, ldtr H_l, [srcend, -16])
> +	USER(9002f, ldtr H_h, [srcend, -8])
> +	stp	G_l, G_h, [dstend, -32]
> +	stp	H_l, H_h, [dstend, -16]
> +	mov	x0, #0
> +	ret
> +
> +	.p2align 4
> +	/* Copy more than 128 bytes.  */
> +.Lcopy_long:
> +	/* Copy 16 bytes and then align dst to 16-byte alignment.  */
> +	USER(9000f, ldtr	D_l, [src])
> +	USER(9000f, ldtr	D_h, [src, 8])
> +	and	tmp1, dstin, 15
> +	bic	dst, dstin, 15
> +	sub	src, src, tmp1
> +	add	count, count, tmp1	/* Count is now 16 too large.  */
> +	USER(9003f, ldtr	A_l, [src, 16])
> +	USER(9003f, ldtr	A_h, [src, 24])
> +	stp	D_l, D_h, [dstin]
> +	USER(9004f, ldtr	B_l, [src, 32])
> +	USER(9004f, ldtr	B_h, [src, 40])
> +	USER(9004f, ldtr	C_l, [src, 48])
> +	USER(9004f, ldtr	C_h, [src, 56])
> +	USER(9004f, ldtr	D_l, [src, 64])
> +	USER(9004f, ldtr	D_h, [src, 72])
> +	add		src, src, 64
> +	subs	count, count, 128 + 16	/* Test and readjust count.  */
> +	b.ls	.Lcopy64_from_end
> +.Lloop64:
> +	stp	A_l, A_h, [dst, 16]
> +	USER(9005f, ldtr	A_l, [src, 16])
> +	USER(9005f, ldtr	A_h, [src, 24])
> +	stp	B_l, B_h, [dst, 32]
> +	USER(9006f, ldtr	B_l, [src, 32])
> +	USER(9006f, ldtr	B_h, [src, 40])
> +	stp	C_l, C_h, [dst, 48]
> +	USER(9007f, ldtr	C_l, [src, 48])
> +	USER(9007f, ldtr	C_h, [src, 56])
> +	stp	D_l, D_h, [dst, 64]!
> +	USER(9008f, ldtr	D_l, [src, 64])
> +	USER(9008f, ldtr	D_h, [src, 72])
> +	add		src, src, 64
> +	subs	count, count, 64
> +	b.hi	.Lloop64
> +
> +	/* Write the last iteration and copy 64 bytes from the end.  */
> +.Lcopy64_from_end:
> +	USER(9005f, ldtr	E_l, [srcend, -64])
> +	USER(9005f, ldtr	E_h, [srcend, -56])
> +	stp	A_l, A_h, [dst, 16]
> +	USER(9006f, ldtr	A_l, [srcend, -48])
> +	USER(9006f, ldtr	A_h, [srcend, -40])
> +	stp	B_l, B_h, [dst, 32]
> +	USER(9007f, ldtr	B_l, [srcend, -32])
> +	USER(9007f, ldtr	B_h, [srcend, -24])
> +	stp	C_l, C_h, [dst, 48]
> +	USER(9009f, ldtr	C_l, [srcend, -16])
> +	USER(9009f, ldtr	C_h, [srcend, -8])
> +	stp	D_l, D_h, [dst, 64]
> +	stp	E_l, E_h, [dstend, -64]
> +	stp	A_l, A_h, [dstend, -48]
> +	stp	B_l, B_h, [dstend, -32]
> +	stp	C_l, C_h, [dstend, -16]
> + 	mov	x0, #0				// Nothing to copy
> + 	ret
> +
> +#else
> +
> +	add	srcend, src, count
> +	add	dstend, dstin, count
> +	cmp	count, 128
> +	b.hi	.Lcopy_long
> +	cmp	count, 32
> +	b.hi	.Lcopy32_128
> +
> +	/* Small copies: 0..32 bytes.  */
> +	cmp	count, 16
> +	b.lo	.Lcopy16
> +	USER(9000f, ldp	A_l, A_h, [src])
> +	USER(9000f, ldp	D_l, D_h, [srcend, -16])
> +	stp	A_l, A_h, [dstin]
> +	stp	D_l, D_h, [dstend, -16]
> +	mov	x0, #0
> +	ret
> +
> +	/* Copy 8-15 bytes.  */
> +.Lcopy16:
> +	tbz	count, 3, .Lcopy8
> +	USER(9000f, ldr	A_l, [src])
> +	USER(9000f, ldr	A_h, [srcend, -8])
> +	str	A_l, [dstin]
> +	str	A_h, [dstend, -8]
> +	mov	x0, #0
> +	ret
> +
> +	.p2align 3
> +	/* Copy 4-7 bytes.  */
> +.Lcopy8:
> +	tbz	count, 2, .Lcopy4
> +	USER(9000f, ldr	A_lw, [src])
> +	USER(9000f, ldr	B_lw, [srcend, -4])
> +	str	A_lw, [dstin]
> +	str	B_lw, [dstend, -4]
> +	mov	x0, #0
> +	ret
> +
> +	/* Copy 0..3 bytes using a branchless sequence.  */
> +.Lcopy4:
> +	cbz	count, .Lcopy0
> +	lsr	tmp1, count, 1
> +	USER(9000f, ldrb	A_lw, [src])
> +	USER(9000f, ldrb	B_lw, [src, tmp1])
> +	USER(9000f, ldrb	C_lw, [srcend, -1])
> +	strb	A_lw, [dstin]
> +	strb	B_lw, [dstin, tmp1]
> +	strb	C_lw, [dstend, -1]
> +.Lcopy0:
> +	mov	x0, #0
> +	ret
> +
> +	.p2align 4
> +	/* Medium copies: 33..128 bytes.  */
> +.Lcopy32_128:
> +	USER(9000f, ldp	A_l, A_h, [src])
> +	USER(9000f, ldp	B_l, B_h, [src, 16])
> +	stp	A_l, A_h, [dstin]
> +	stp	B_l, B_h, [dstin, 16]
> +	cmp	count, 64
> +	b.hi	.Lcopy128
> +	USER(9001f, ldp	C_l, C_h, [srcend, -32])
> +	USER(9001f, ldp	D_l, D_h, [srcend, -16])
> +	stp	C_l, C_h, [dstend, -32]
> +	stp	D_l, D_h, [dstend, -16]
> +	mov	x0, #0
> +	ret
> +
> +	.p2align 4
> +	/* Copy 65..128 bytes.  */
> +.Lcopy128:
> +	USER(9001f, ldp	E_l, E_h, [src, 32])
> +	USER(9001f, ldp	F_l, F_h, [src, 48])
> +	stp	E_l, E_h, [dstin, 32]
> +	stp	F_l, F_h, [dstin, 48]
> +	cmp	count, 96
> +	b.ls	.Lcopy96
> +	USER(9002f, ldp	C_l, C_h, [srcend, -64])
> +	USER(9002f, ldp	D_l, D_h, [srcend, -48])
> +	stp	C_l, C_h, [dstend, -64]
> +	stp	D_l, D_h, [dstend, -48]
> +.Lcopy96:
> +	USER(9002f, ldp	G_l, G_h, [srcend, -32])
> +	USER(9002f, ldp	H_l, H_h, [srcend, -16])
> +	stp	G_l, G_h, [dstend, -32]
> +	stp	H_l, H_h, [dstend, -16]
> +	mov	x0, #0
> +	ret
> +
> +	.p2align 4
> +	/* Copy more than 128 bytes.  */
> +.Lcopy_long:
> +	/* Copy 16 bytes and then align dst to 16-byte alignment.  */
> +
> +	USER(9000f, ldp	D_l, D_h, [src])
> +	and	tmp1, dstin, 15
> +	bic	dst, dstin, 15
> +	sub	src, src, tmp1
> +	add	count, count, tmp1	/* Count is now 16 too large.  */
> +	USER(9003f, ldp	A_l, A_h, [src, 16])
> +	stp	D_l, D_h, [dstin]
> +	USER(9004f, ldp	B_l, B_h, [src, 32])
> +	USER(9004f, ldp	C_l, C_h, [src, 48])
> +	USER(9004f, ldp	D_l, D_h, [src, 64]!)
> +	subs	count, count, 128 + 16	/* Test and readjust count.  */
> +	b.ls	.Lcopy64_from_end
> +
> +.Lloop64:
> +	stp	A_l, A_h, [dst, 16]
> +	USER(9005f, ldp	A_l, A_h, [src, 16])
> +	stp	B_l, B_h, [dst, 32]
> +	USER(9006f, ldp	B_l, B_h, [src, 32])
> +	stp	C_l, C_h, [dst, 48]
> +	USER(9007f, ldp	C_l, C_h, [src, 48])
> +	stp	D_l, D_h, [dst, 64]!
> +	USER(9008f, ldp	D_l, D_h, [src, 64]!)
> +	subs	count, count, 64
> +	b.hi	.Lloop64
> +
> +	/* Write the last iteration and copy 64 bytes from the end.  */
> +.Lcopy64_from_end:
> +	USER(9005f, ldp	E_l, E_h, [srcend, -64])
> +	stp	A_l, A_h, [dst, 16]
> +	USER(9006f, ldp	A_l, A_h, [srcend, -48])
> +	stp	B_l, B_h, [dst, 32]
> +	USER(9007f, ldp	B_l, B_h, [srcend, -32])
> +	stp	C_l, C_h, [dst, 48]
> +	USER(9009f, ldp	C_l, C_h, [srcend, -16])
> +	stp	D_l, D_h, [dst, 64]
> +	stp	E_l, E_h, [dstend, -64]
> +	stp	A_l, A_h, [dstend, -48]
> +	stp	B_l, B_h, [dstend, -32]
> +	stp	C_l, C_h, [dstend, -16]
> +	mov	x0, #0				// Nothing to copy
> +	ret
> +
> +#endif
> +
> + 	// non-mops exception fixups
> +9003:
> +	sub count, count, tmp1
> +9000:
> + 	// Before being absolutely sure we couldn't copy anything, try harder
> +	USER(.Lcopy_none, ldtrb A_lw, [src])
> +	strb A_lw, [dstin]
> +	sub x0, count, 1
> +	ret
> +
> +9001:
> +	sub x0, count, 32
> +	ret
> +
> +9002:
> +	sub count, dstend, dstin
> +	sub x0, count, 64
> +	ret
> +
> +9004:
> +	sub count, count, tmp1
> +	sub x0, count, 16
> +	ret
> +
> +9005:
> +	add tmp1, dstin, 16
> +	add x0, dst, 16
> +	cmp x0, tmp1
> +	csel x0, x0, tmp1, hi
> +	b	.Lsub_destend_x0
> +
> +9006:
> +	add x0, dst, 32
> +	b	.Lsub_destend_x0
> +
> +9007:
> +	add x0, dst, 48
> +	b	.Lsub_destend_x0
> +
> +9008:
> +	sub x0, dstend, dst
> +    ret
> +
> +9009:
> +	add x0, dst, 64
> +.Lsub_destend_x0:
> +	sub x0, dstend, x0
> + 	ret
> +
> +.Lcopy_none: // bytes not copied at all
> +	mov x0, count
> +	ret
> +
>   SYM_FUNC_END(__arch_copy_from_user)
>   EXPORT_SYMBOL(__arch_copy_from_user)
> diff --git a/arch/arm64/lib/copy_template.S b/arch/arm64/lib/copy_template.S
> deleted file mode 100644
> index 7f2f5a0e2fb9..000000000000
> --- a/arch/arm64/lib/copy_template.S
> +++ /dev/null
> @@ -1,191 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0-only */
> -/*
> - * Copyright (C) 2013 ARM Ltd.
> - * Copyright (C) 2013 Linaro.
> - *
> - * This code is based on glibc cortex strings work originally authored by Linaro
> - * be found @
> - *
> - * http://bazaar.launchpad.net/~linaro-toolchain-dev/cortex-strings/trunk/
> - * files/head:/src/aarch64/
> - */
> -
> -
> -/*
> - * Copy a buffer from src to dest (alignment handled by the hardware)
> - *
> - * Parameters:
> - *	x0 - dest
> - *	x1 - src
> - *	x2 - n
> - * Returns:
> - *	x0 - dest
> - */
> -dstin	.req	x0
> -src	.req	x1
> -count	.req	x2
> -tmp1	.req	x3
> -tmp1w	.req	w3
> -tmp2	.req	x4
> -tmp2w	.req	w4
> -dst	.req	x6
> -
> -A_l	.req	x7
> -A_h	.req	x8
> -B_l	.req	x9
> -B_h	.req	x10
> -C_l	.req	x11
> -C_h	.req	x12
> -D_l	.req	x13
> -D_h	.req	x14
> -
> -	mov	dst, dstin
> -
> -#ifdef CONFIG_AS_HAS_MOPS
> -alternative_if_not ARM64_HAS_MOPS
> -	b	.Lno_mops
> -alternative_else_nop_endif
> -	cpy1	dst, src, count
> -	b	.Lexitfunc
> -.Lno_mops:
> -#endif
> -
> -	cmp	count, #16
> -	/*When memory length is less than 16, the accessed are not aligned.*/
> -	b.lo	.Ltiny15
> -
> -	neg	tmp2, src
> -	ands	tmp2, tmp2, #15/* Bytes to reach alignment. */
> -	b.eq	.LSrcAligned
> -	sub	count, count, tmp2
> -	/*
> -	* Copy the leading memory data from src to dst in an increasing
> -	* address order.By this way,the risk of overwriting the source
> -	* memory data is eliminated when the distance between src and
> -	* dst is less than 16. The memory accesses here are alignment.
> -	*/
> -	tbz	tmp2, #0, 1f
> -	ldrb1	tmp1w, src, #1
> -	strb1	tmp1w, dst, #1
> -1:
> -	tbz	tmp2, #1, 2f
> -	ldrh1	tmp1w, src, #2
> -	strh1	tmp1w, dst, #2
> -2:
> -	tbz	tmp2, #2, 3f
> -	ldr1	tmp1w, src, #4
> -	str1	tmp1w, dst, #4
> -3:
> -	tbz	tmp2, #3, .LSrcAligned
> -	ldr1	tmp1, src, #8
> -	str1	tmp1, dst, #8
> -
> -.LSrcAligned:
> -	cmp	count, #64
> -	b.ge	.Lcpy_over64
> -	/*
> -	* Deal with small copies quickly by dropping straight into the
> -	* exit block.
> -	*/
> -.Ltail63:
> -	/*
> -	* Copy up to 48 bytes of data. At this point we only need the
> -	* bottom 6 bits of count to be accurate.
> -	*/
> -	ands	tmp1, count, #0x30
> -	b.eq	.Ltiny15
> -	cmp	tmp1w, #0x20
> -	b.eq	1f
> -	b.lt	2f
> -	ldp1	A_l, A_h, src, #16
> -	stp1	A_l, A_h, dst, #16
> -1:
> -	ldp1	A_l, A_h, src, #16
> -	stp1	A_l, A_h, dst, #16
> -2:
> -	ldp1	A_l, A_h, src, #16
> -	stp1	A_l, A_h, dst, #16
> -.Ltiny15:
> -	/*
> -	* Prefer to break one ldp/stp into several load/store to access
> -	* memory in an increasing address order,rather than to load/store 16
> -	* bytes from (src-16) to (dst-16) and to backward the src to aligned
> -	* address,which way is used in original cortex memcpy. If keeping
> -	* the original memcpy process here, memmove need to satisfy the
> -	* precondition that src address is at least 16 bytes bigger than dst
> -	* address,otherwise some source data will be overwritten when memove
> -	* call memcpy directly. To make memmove simpler and decouple the
> -	* memcpy's dependency on memmove, withdrew the original process.
> -	*/
> -	tbz	count, #3, 1f
> -	ldr1	tmp1, src, #8
> -	str1	tmp1, dst, #8
> -1:
> -	tbz	count, #2, 2f
> -	ldr1	tmp1w, src, #4
> -	str1	tmp1w, dst, #4
> -2:
> -	tbz	count, #1, 3f
> -	ldrh1	tmp1w, src, #2
> -	strh1	tmp1w, dst, #2
> -3:
> -	tbz	count, #0, .Lexitfunc
> -	ldrb1	tmp1w, src, #1
> -	strb1	tmp1w, dst, #1
> -
> -	b	.Lexitfunc
> -
> -.Lcpy_over64:
> -	subs	count, count, #128
> -	b.ge	.Lcpy_body_large
> -	/*
> -	* Less than 128 bytes to copy, so handle 64 here and then jump
> -	* to the tail.
> -	*/
> -	ldp1	A_l, A_h, src, #16
> -	stp1	A_l, A_h, dst, #16
> -	ldp1	B_l, B_h, src, #16
> -	ldp1	C_l, C_h, src, #16
> -	stp1	B_l, B_h, dst, #16
> -	stp1	C_l, C_h, dst, #16
> -	ldp1	D_l, D_h, src, #16
> -	stp1	D_l, D_h, dst, #16
> -
> -	tst	count, #0x3f
> -	b.ne	.Ltail63
> -	b	.Lexitfunc
> -
> -	/*
> -	* Critical loop.  Start at a new cache line boundary.  Assuming
> -	* 64 bytes per line this ensures the entire loop is in one line.
> -	*/
> -	.p2align	L1_CACHE_SHIFT
> -.Lcpy_body_large:
> -	/* pre-get 64 bytes data. */
> -	ldp1	A_l, A_h, src, #16
> -	ldp1	B_l, B_h, src, #16
> -	ldp1	C_l, C_h, src, #16
> -	ldp1	D_l, D_h, src, #16
> -1:
> -	/*
> -	* interlace the load of next 64 bytes data block with store of the last
> -	* loaded 64 bytes data.
> -	*/
> -	stp1	A_l, A_h, dst, #16
> -	ldp1	A_l, A_h, src, #16
> -	stp1	B_l, B_h, dst, #16
> -	ldp1	B_l, B_h, src, #16
> -	stp1	C_l, C_h, dst, #16
> -	ldp1	C_l, C_h, src, #16
> -	stp1	D_l, D_h, dst, #16
> -	ldp1	D_l, D_h, src, #16
> -	subs	count, count, #64
> -	b.ge	1b
> -	stp1	A_l, A_h, dst, #16
> -	stp1	B_l, B_h, dst, #16
> -	stp1	C_l, C_h, dst, #16
> -	stp1	D_l, D_h, dst, #16
> -
> -	tst	count, #0x3f
> -	b.ne	.Ltail63
> -.Lexitfunc:
> diff --git a/arch/arm64/lib/copy_to_user.S b/arch/arm64/lib/copy_to_user.S
> index 819f2e3fc7a9..e50bdcef7cdf 100644
> --- a/arch/arm64/lib/copy_to_user.S
> +++ b/arch/arm64/lib/copy_to_user.S
> @@ -19,37 +19,6 @@
>    * Returns:
>    *	x0 - bytes not copied
>    */
> -	.macro ldrb1 reg, ptr, val
> -	ldrb  \reg, [\ptr], \val
> -	.endm
> -
> -	.macro strb1 reg, ptr, val
> -	user_ldst 9998f, sttrb, \reg, \ptr, \val
> -	.endm
> -
> -	.macro ldrh1 reg, ptr, val
> -	ldrh  \reg, [\ptr], \val
> -	.endm
> -
> -	.macro strh1 reg, ptr, val
> -	user_ldst 9997f, sttrh, \reg, \ptr, \val
> -	.endm
> -
> -	.macro ldr1 reg, ptr, val
> -	ldr \reg, [\ptr], \val
> -	.endm
> -
> -	.macro str1 reg, ptr, val
> -	user_ldst 9997f, sttr, \reg, \ptr, \val
> -	.endm
> -
> -	.macro ldp1 reg1, reg2, ptr, val
> -	ldp \reg1, \reg2, [\ptr], \val
> -	.endm
> -
> -	.macro stp1 reg1, reg2, ptr, val
> -	user_stp 9997f, \reg1, \reg2, \ptr, \val
> -	.endm
>   
>   	.macro cpy1 dst, src, count
>   	.arch_extension mops
> @@ -58,16 +27,48 @@
>   	USER_CPY(9996f, 1, cpyfewt [\dst]!, [\src]!, \count!)
>   	.endm
>   
> -end	.req	x5
> -srcin	.req	x15
> +dstin	.req	x0
> +src	.req	x1
> +count	.req	x2
> +dst	.req	x3
> +srcend	.req	x4
> +dstend	.req	x5
> +srcin	.req	x6
> +A_l	.req	x6
> +A_lw	.req	w6
> +A_h	.req	x7
> +B_l	.req	x8
> +B_lw	.req	w8
> +B_h	.req	x9
> +C_l	.req	x10
> +C_lw	.req	w10
> +C_h	.req	x11
> +D_l	.req	x12
> +D_lw	.req	w12
> +D_h	.req	x13
> +E_l	.req	x14
> +E_h	.req	x15
> +F_l	.req	x16
> +F_h	.req	x17
> +G_l	.req	count
> +G_h	.req	dst
> +H_l	.req	src
> +H_h	.req	srcend
> +tmp1	.req	x14
> +
>   SYM_FUNC_START(__arch_copy_to_user)
> -	add	end, x0, x2
> +#ifdef CONFIG_AS_HAS_MOPS
> +alternative_if_not ARM64_HAS_MOPS
> +	b	.Lno_mops
> +alternative_else_nop_endif
> +	add	dstend, x0, x2
>   	mov	srcin, x1
> -#include "copy_template.S"
> -	mov	x0, #0
> +	mov	dst, dstin
> +	cpy1	dst, src, count
> +	mov	x0, #0				// Nothing left to copy
>   	ret
>   
> -	// Exception fixups
> +	// mops exception fixups
>   9996:	b.cs	9997f
>   	// Registers are in Option A format
>   	add	dst, dst, count
> @@ -77,7 +78,545 @@ SYM_FUNC_START(__arch_copy_to_user)
>   	ldrb	tmp1w, [srcin]
>   USER(9998f, sttrb tmp1w, [dst])
>   	add	dst, dst, #1
> -9998:	sub	x0, end, dst			// bytes not copied
> +9998:	sub	x0, dstend, dst			// bytes not copied
> +	ret
> +
> +.Lno_mops:
> +#endif
> +
> +#ifdef CONFIG_ARM64_PAN
> +	add	srcend, src, count
> +	add	dstend, dstin, count
> +	cmp	count, 128
> +	b.hi	.Lcopy_long
> +	cmp	count, 32
> +	b.hi	.Lcopy32_128
> +
> +	/* Small copies: 0..32 bytes.  */
> +	cmp	count, 16
> +	b.lo	.Lcopy16
> +	ldp	A_l, A_h, [src]
> +	ldp	D_l, D_h, [srcend, -16]
> +	USER(9000f, sttr A_l, [dstin])
> +	USER(9001f, sttr A_h, [dstin, 8])
> +	USER(9002f, sttr D_l, [dstend, -16])
> +	USER(9003f, sttr D_h, [dstend, -8])
> +	mov	x0, #0
> +	ret
> +
> +	/* Copy 8-15 bytes.  */
> +.Lcopy16:
> +	tbz	count, 3, .Lcopy8
> +	ldr	A_l, [src]
> +	ldr	A_h, [srcend, -8]
> +	USER(9004f, sttr A_l, [dstin])
> +	USER(9005f, sttr A_h, [dstend, -8])
> +	mov	x0, #0
> +	ret
> +
> +	.p2align 3
> +	/* Copy 4-7 bytes.  */
> +.Lcopy8:
> +	tbz	count, 2, .Lcopy4
> +	ldr	A_lw, [src]
> +	ldr	B_lw, [srcend, -4]
> +	USER(9006f, sttr A_lw, [dstin])
> +	USER(9007f, sttr B_lw, [dstend, -4])
> +	mov	x0, #0
> +	ret
> +
> +	/* Copy 0..3 bytes using a branchless sequence.  */
> +.Lcopy4:
> +	cbz	count, .Lcopy0
> +	lsr tmp1, count, #1
> +	add dst, dstin, count, lsr #1
> +	ldrb A_lw, [src]
> +	ldrb C_lw, [srcend, -1]
> +	ldrb B_lw, [src, tmp1]
> +	USER(9008f, sttrb A_lw, [dstin])
> +	USER(9009f, sttrb B_lw, [dst])
> +	USER(9010f, sttrb C_lw, [dstend, -1])
> +.Lcopy0:
> +	mov	x0, #0
> +	ret
> +
> +	.p2align 4
> +	/* Medium copies: 33..128 bytes.  */
> +.Lcopy32_128:
> +	ldp	A_l, A_h, [src]
> +	ldp	B_l, B_h, [src, 16]
> +	ldp	C_l, C_h, [srcend, -32]
> +	ldp	D_l, D_h, [srcend, -16]
> +	USER(9011f, sttr A_l, [dstin])
> +	USER(9012f, sttr A_h, [dstin, 8])
> +	USER(9013f, sttr B_l, [dstin, 16])
> +	USER(9014f, sttr B_h, [dstin, 24])
> +	cmp	count, 64
> +	b.hi	.Lcopy128
> +	USER(9015f, sttr C_l, [dstend, -32])
> +	USER(9016f, sttr C_h, [dstend, -24])
> +	USER(9017f, sttr D_l, [dstend, -16])
> +	USER(9018f, sttr D_h, [dstend, -8])
> +	mov	x0, #0
> +	ret
> +
> +	.p2align 4
> +	/* Copy 65..128 bytes.  */
> +.Lcopy128:
> +	ldp	E_l, E_h, [src, 32]
> +	ldp	F_l, F_h, [src, 48]
> +	USER(9023f, sttr E_l, [dstin, 32])
> +	USER(9024f, sttr E_h, [dstin, 40])
> +	USER(9025f, sttr F_l, [dstin, 48])
> +	USER(9026f, sttr F_h, [dstin, 56])
> +	cmp	count, 96
> +	b.ls	.Lcopy96
> +	ldp	G_l, G_h, [srcend, -64]
> +	ldp	H_l, H_h, [srcend, -48]
> +	USER(9027f, sttr G_l, [dstend, -64])
> +	USER(9028f, sttr G_h, [dstend, -56])
> +	USER(9029f, sttr H_l, [dstend, -48])
> +	USER(9030f, sttr H_h, [dstend, -40])
> +.Lcopy96:
> +	USER(9043f, sttr C_l, [dstend, -32])
> +	USER(9044f, sttr C_h, [dstend, -24])
> +	USER(9045f, sttr D_l, [dstend, -16])
> +	USER(9046f, sttr D_h, [dstend, -8])
> +	mov	x0, #0
> +	ret
> +
> +	.p2align 4
> +	/* Copy more than 128 bytes.  */
> +.Lcopy_long:
> +	/* Copy 16 bytes and then align dst to 16-byte alignment.  */
> +	ldp	D_l, D_h, [src]
> +	and	tmp1, dstin, 15
> +	bic	dst, dstin, 15
> +	sub	src, src, tmp1
> +	add	count, count, tmp1	/* Count is now 16 too large.  */
> +	ldp	A_l, A_h, [src, 16]
> +	USER(9047f, sttr D_l, [dstin])
> +	USER(9048f, sttr D_h, [dstin, 8])
> +	ldp	B_l, B_h, [src, 32]
> +	ldp	C_l, C_h, [src, 48]
> +	ldp	D_l, D_h, [src, 64]!
> +	subs	count, count, 128 + 16	/* Test and readjust count.  */
> +	b.ls	.Lcopy64_from_end
> +
> +.Lloop64:
> +	USER(9049f, sttr A_l, [dst, 16])
> +	USER(9050f, sttr A_h, [dst, 24])
> +	ldp	A_l, A_h, [src, 16]
> +	USER(9051f, sttr B_l, [dst, 32])
> +	USER(9052f, sttr B_h, [dst, 40])
> +	ldp	B_l, B_h, [src, 32]
> +	USER(9053f, sttr C_l, [dst, 48])
> +	USER(9054f, sttr C_h, [dst, 56])
> +	ldp	C_l, C_h, [src, 48]
> +	USER(9055f, sttr D_l, [dst, 64])
> +	USER(9056f, sttr D_h, [dst, 72])
> +	add dst, dst, 64
> +	ldp	D_l, D_h, [src, 64]!
> +	subs	count, count, 64
> +	b.hi	.Lloop64
> +
> +	/* Write the last iteration and copy 64 bytes from the end.  */
> +.Lcopy64_from_end:
> +	ldp	E_l, E_h, [srcend, -64]
> +	USER(9057f, sttr A_l, [dst, 16])
> +	USER(9058f, sttr A_h, [dst, 24])
> +	ldp	A_l, A_h, [srcend, -48]
> +	USER(9059f, sttr B_l, [dst, 32])
> +	USER(9060f, sttr B_h, [dst, 40])
> +	ldp	B_l, B_h, [srcend, -32]
> +	USER(9061f, sttr C_l, [dst, 48])
> +	USER(9062f, sttr C_h, [dst, 56])
> +	ldp	C_l, C_h, [srcend, -16]
> +	USER(9063f, sttr D_l, [dst, 64])
> +	USER(9064f, sttr D_h, [dst, 72])
> +	USER(9065f, sttr E_l, [dstend, -64])
> +	USER(9066f, sttr E_h, [dstend, -56])
> +	USER(9067f, sttr A_l, [dstend, -48])
> +	USER(9068f, sttr A_h, [dstend, -40])
> +	USER(9069f, sttr B_l, [dstend, -32])
> +	USER(9070f, sttr B_h, [dstend, -24])
> +	USER(9071f, sttr C_l, [dstend, -16])
> +	USER(9072f, sttr C_h, [dstend, -8])
> + 	mov	x0, #0
> + 	ret
> +
> +#else
> +
> +	add	srcend, src, count
> +	add	dstend, dstin, count
> +	cmp	count, 128
> +	b.hi	.Lcopy_long
> +	cmp	count, 32
> +	b.hi	.Lcopy32_128
> +
> +	/* Small copies: 0..32 bytes.  */
> +	cmp	count, 16
> +	b.lo	.Lcopy16
> +	ldp	A_l, A_h, [src]
> +	ldp	D_l, D_h, [srcend, -16]
> +	USER(9000f, stp	A_l, A_h, [dstin])
> +	USER(9002f, stp	D_l, D_h, [dstend, -16])
> +	mov x0, #0
> +	ret
> +
> +	/* Copy 8-15 bytes.  */
> +.Lcopy16:
> +	tbz	count, 3, .Lcopy8
> +	ldr	A_l, [src]
> +	ldr	A_h, [srcend, -8]
> +	USER(9004f, str	A_l, [dstin])
> +	USER(9005f, str	A_h, [dstend, -8])
> +	mov x0, #0
> +	ret
> +
> +	.p2align 3
> +	/* Copy 4-7 bytes.  */
> +.Lcopy8:
> +	tbz	count, 2, .Lcopy4
> +	ldr	A_lw, [src]
> +	ldr	B_lw, [srcend, -4]
> +	USER(9006f, str	A_lw, [dstin])
> +	USER(9007f, str	B_lw, [dstend, -4])
> +	mov x0, #0
> +	ret
> +
> +	/* Copy 0..3 bytes using a branchless sequence.  */
> +.Lcopy4:
> +	cbz	count, .Lcopy0
> +	lsr	tmp1, count, 1
> +	ldrb	A_lw, [src]
> +	ldrb	C_lw, [srcend, -1]
> +	ldrb	B_lw, [src, tmp1]
> +	USER(9008f, strb	A_lw, [dstin])
> +	USER(9009f, strb	B_lw, [dstin, tmp1])
> +	USER(9010f, strb	C_lw, [dstend, -1])
> +.Lcopy0:
> +	mov x0, #0
> +	ret
> +
> +	.p2align 4
> +	/* Medium copies: 33..128 bytes.  */
> +.Lcopy32_128:
> +	ldp	A_l, A_h, [src]
> +	ldp	B_l, B_h, [src, 16]
> +	ldp	C_l, C_h, [srcend, -32]
> +	ldp	D_l, D_h, [srcend, -16]
> +	USER(9011f, stp	A_l, A_h, [dstin])
> +	USER(9013f, stp	B_l, B_h, [dstin, 16])
> +	cmp	count, 64
> +	b.hi	.Lcopy128
> +	USER(9015f, stp	C_l, C_h, [dstend, -32])
> +	USER(9017f, stp	D_l, D_h, [dstend, -16])
> +	mov x0, #0
>   	ret
> +
> +	.p2align 4
> +	/* Copy 65..128 bytes.  */
> +.Lcopy128:
> +	ldp	E_l, E_h, [src, 32]
> +	ldp	F_l, F_h, [src, 48]
> +	USER(9023f, stp	E_l, E_h, [dstin, 32])
> +	USER(9025f, stp	F_l, F_h, [dstin, 48])
> +	cmp	count, 96
> +	b.ls	.Lcopy96
> +	ldp	G_l, G_h, [srcend, -64]
> +	ldp	H_l, H_h, [srcend, -48]
> +	USER(9027f, stp	G_l, G_h, [dstend, -64])
> +	USER(9029f, stp	H_l, H_h, [dstend, -48])
> +.Lcopy96:
> +	USER(9043f, stp	C_l, C_h, [dstend, -32])
> +	USER(9045f, stp	D_l, D_h, [dstend, -16])
> +	mov x0, #0
> +	ret
> +
> +	.p2align 4
> +	/* Copy more than 128 bytes.  */
> +.Lcopy_long:
> +	/* Copy 16 bytes and then align dst to 16-byte alignment.  */
> +
> +	ldp	D_l, D_h, [src]
> +	and	tmp1, dstin, 15
> +	bic	dst, dstin, 15
> +	sub	src, src, tmp1
> +	add	count, count, tmp1	/* Count is now 16 too large.  */
> +	ldp	A_l, A_h, [src, 16]
> +	USER(9047f, stp	D_l, D_h, [dstin])
> +	ldp	B_l, B_h, [src, 32]
> +	ldp	C_l, C_h, [src, 48]
> +	ldp	D_l, D_h, [src, 64]!
> +	subs	count, count, 128 + 16	/* Test and readjust count.  */
> +	b.ls	.Lcopy64_from_end
> +
> +.Lloop64:
> +	USER(9049f, stp	A_l, A_h, [dst, 16])
> +	ldp	A_l, A_h, [src, 16]
> +	USER(9051f, stp	B_l, B_h, [dst, 32])
> +	ldp	B_l, B_h, [src, 32]
> +	USER(9053f, stp	C_l, C_h, [dst, 48])
> +	ldp	C_l, C_h, [src, 48]
> +	USER(9055f, stp	D_l, D_h, [dst, 64]!)
> +	ldp	D_l, D_h, [src, 64]!
> +	subs	count, count, 64
> +	b.hi	.Lloop64
> +
> +	/* Write the last iteration and copy 64 bytes from the end.  */
> +.Lcopy64_from_end:
> +	ldp	E_l, E_h, [srcend, -64]
> +	USER(9057f, stp	A_l, A_h, [dst, 16])
> +	ldp	A_l, A_h, [srcend, -48]
> +	USER(9059f, stp	B_l, B_h, [dst, 32])
> +	ldp	B_l, B_h, [srcend, -32]
> +	USER(9061f, stp	C_l, C_h, [dst, 48])
> +	ldp	C_l, C_h, [srcend, -16]
> +	USER(9063f, stp	D_l, D_h, [dst, 64])
> +	USER(9065f, stp	E_l, E_h, [dstend, -64])
> +	USER(9067f, stp	A_l, A_h, [dstend, -48])
> +	USER(9069f, stp	B_l, B_h, [dstend, -32])
> +	USER(9071f, stp	C_l, C_h, [dstend, -16])
> +	mov x0, #0
> +	ret
> +
> +#endif
> +
> + 	// non-mops exception fixups
> +9000:
> +9004:
> +9006:
> +9011:
> +	// Before being absolutely sure we couldn't copy anything, try harder
> +	USER(.Lcopy_none, sttrb A_lw, [dstin])
> +	b	.Lcount_minus_one
> +
> +9020:
> +	sub count, dstend, dstin
> +	b	.Lcount_minus_8
> +
> +9021:
> +	sub count, dstend, dstin
> +	b	.Lcount_minus_16
> +
> +9022:
> +	sub count, dstend, dstin
> +	b	.Lcount_minus_24
> +
> +9023:
> +	sub count, dstend, dstin
> +	b	.Lcount_minus_32
> +
> +9024:
> +	sub count, dstend, dstin
> +	b	.Lcount_minus_40
> +
> +9025:
> +	sub count, dstend, dstin
> +	b	.Lcount_minus_48
> +
> +9026:
> +	sub count, dstend, dstin
> +	b	.Lcount_minus_56
> +
> +9007:
> +	sub x0, count, #4
> +	ret
> +
> +9047:
> +	sub	count, count, tmp1
> +	USER(.Lcopy_none, sttrb D_lw, [dstin])
> +9009:
> +.Lcount_minus_one:
> +	sub x0, count, #1
> +	ret
> +
> +9001:
> +9005:
> +9012:
> +.Lcount_minus_8:
> +	sub x0, count, #8
> +	ret
> +
> +9003:
> +	add tmp1, dstin, #16
> +	sub x0, dstend, #8
> +	b	.Lmax
> +
> +9049:
> +9057:
> +	sub count, dstend, dst
> +9002:
> +9013:
> +.Lcount_minus_16:
> +	sub x0, count, #16
> +	ret
> +
> +9050:
> +9058:
> +	sub count, dstend, dst
> +9014:
> +.Lcount_minus_24:
> +	sub x0, count, #24
> +	ret
> +
> +9048:
> +	sub count, count, tmp1
> +	b	.Lcount_minus_8
> +
> +9010:
> +	mov x0, #1
> +	ret
> +
> +9018:
> +	add tmp1, dstin, #32
> +	sub x0, dstend, #8
> +	b	.Lmax
> +
> +9046:
> +	add tmp1, dstin, #64
> +	sub x0, dstend, #8
> +	b	.Lmax
> +
> +9072:
> +	add tmp1, dst, #80
> +	sub x0, dstend, #8
> +	b	.Lmax
> +
> +9017:
> +	add tmp1, dstin, #32
> +	sub x0, dstend, #16
> +	b	.Lmax
> +
> +9045:
> +	add tmp1, dstin, #64
> +	sub x0, dstend, #16
> +	b	.Lmax
> +
> +9071:
> +	add tmp1, dst, #80
> +	sub x0, dstend, #16
> +	b	.Lmax
> +
> +9016:
> +	add tmp1, dstin, #32
> +	sub x0, dstend, #24
> +	b	.Lmax
> +
> +9044:
> +	add tmp1, dstin, #64
> +	sub x0, dstend, #24
> +	b	.Lmax
> +
> +9070:
> +	add tmp1, dst, #80
> +	sub x0, dstend, #24
> +	b	.Lmax
> +
> +9015:
> +	add tmp1, dstin, #32
> +	sub x0, dstend, #32
> +	b	.Lmax
> +
> +9043:
> +	add tmp1, dstin, #64
> +	sub x0, dstend, #32
> +	b	.Lmax
> +
> +9069:
> +	add tmp1, dst, #80
> +	sub x0, dstend, #32
> +	b	.Lmax
> +
> +9030:
> +	add tmp1, dstin, #64
> +	sub x0, dstend, #40
> +	b	.Lmax
> +
> +9068:
> +	add tmp1, dst, #80
> +	sub x0, dstend, #40
> +	b	.Lmax
> +
> +9029:
> +	add tmp1, dstin, #64
> +	sub x0, dstend, #48
> +	b	.Lmax
> +
> +9067:
> +	add tmp1, dst, #80
> +	sub x0, dstend, #48
> +	b	.Lmax
> +
> +9028:
> +	add tmp1, dstin, #64
> +	sub x0, dstend, #56
> +	b	.Lmax
> +
> +9066:
> +	add tmp1, dst, #80
> +	sub x0, dstend, #56
> +	b	.Lmax
> +
> +9027:
> +	sub count, dstend, dstin
> +	b	.Lcount_minus_64
> +
> +9065:
> +	add tmp1, dst, #80
> +	sub x0, dstend, #64
> +.Lmax:
> +	cmp x0, tmp1
> +	csel x0, x0, tmp1, hi
> +	sub x0, dstend, x0
> +	ret
> +
> +9051:
> +9059:
> +	sub count, dstend, dst
> +.Lcount_minus_32:
> +	sub x0, count, #32
> +	ret
> +
> +9052:
> +9060:
> +	sub count, dstend, dst
> +.Lcount_minus_40:
> +	sub x0, count, #40
> +	ret
> +
> +9053:
> +9061:
> +	sub count, dstend, dst
> +.Lcount_minus_48:
> +	sub x0, count, #48
> +	ret
> +
> +9054:
> +9062:
> +	sub count, dstend, dst
> +.Lcount_minus_56:
> +	sub x0, count, #56
> +	ret
> +
> +9055:
> +9063:
> +	sub count, dstend, dst
> +.Lcount_minus_64:
> +	sub x0, count, #64
> +	ret
> +
> +9056:
> +9064:
> +	sub count, dstend, dst
> +	sub x0, count, #72
> +	ret
> +
> +9008:
> +.Lcopy_none: // bytes not copied at all
> +	mov x0, count
> +	ret
> +
>   SYM_FUNC_END(__arch_copy_to_user)
>   EXPORT_SYMBOL(__arch_copy_to_user)
> diff --git a/lib/tests/usercopy_kunit.c b/lib/tests/usercopy_kunit.c
> index 80f8abe10968..d4f4f9ee5f48 100644
> --- a/lib/tests/usercopy_kunit.c
> +++ b/lib/tests/usercopy_kunit.c
> @@ -22,14 +22,12 @@
>    * As there doesn't appear to be anything that can safely determine
>    * their capability at compile-time, we just have to opt-out certain archs.
>    */
> -#if BITS_PER_LONG == 64 || (!(defined(CONFIG_ARM) && !defined(MMU)) && \
> -			    !defined(CONFIG_M68K) &&		\
> -			    !defined(CONFIG_MICROBLAZE) &&	\
> -			    !defined(CONFIG_NIOS2) &&		\
> -			    !defined(CONFIG_PPC32) &&		\
> -			    !defined(CONFIG_SPARC32) &&		\
> -			    !defined(CONFIG_SUPERH))
> -# define TEST_U64
> +#if BITS_PER_LONG == 64 ||                                                   \
> +	(!(defined(CONFIG_ARM) && !defined(MMU)) && !defined(CONFIG_M68K) && \
> +	 !defined(CONFIG_MICROBLAZE) && !defined(CONFIG_NIOS2) &&            \
> +	 !defined(CONFIG_PPC32) && !defined(CONFIG_SPARC32) &&               \
> +	 !defined(CONFIG_SUPERH))
> +#define TEST_U64
>   #endif
>   
>   struct usercopy_test_priv {
> @@ -87,7 +85,7 @@ static void usercopy_test_check_nonzero_user(struct kunit *test)
>   		kmem[i] = 0xff;
>   
>   	KUNIT_EXPECT_EQ_MSG(test, copy_to_user(umem, kmem, size), 0,
> -		"legitimate copy_to_user failed");
> +			    "legitimate copy_to_user failed");
>   
>   	for (start = 0; start <= size; start++) {
>   		for (end = start; end <= size; end++) {
> @@ -95,7 +93,8 @@ static void usercopy_test_check_nonzero_user(struct kunit *test)
>   			int retval = check_zeroed_user(umem + start, len);
>   			int expected = is_zeroed(kmem + start, len);
>   
> -			KUNIT_ASSERT_EQ_MSG(test, retval, expected,
> +			KUNIT_ASSERT_EQ_MSG(
> +				test, retval, expected,
>   				"check_nonzero_user(=%d) != memchr_inv(=%d) mismatch (start=%zu, end=%zu)",
>   				retval, expected, start, end);
>   		}
> @@ -121,7 +120,7 @@ static void usercopy_test_copy_struct_from_user(struct kunit *test)
>   	/* Fill umem with a fixed byte pattern. */
>   	memset(umem_src, 0x3e, size);
>   	KUNIT_ASSERT_EQ_MSG(test, copy_to_user(umem, umem_src, size), 0,
> -		    "legitimate copy_to_user failed");
> +			    "legitimate copy_to_user failed");
>   
>   	/* Check basic case -- (usize == ksize). */
>   	ksize = size;
> @@ -130,10 +129,12 @@ static void usercopy_test_copy_struct_from_user(struct kunit *test)
>   	memcpy(expected, umem_src, ksize);
>   
>   	memset(kmem, 0x0, size);
> -	KUNIT_EXPECT_EQ_MSG(test, copy_struct_from_user(kmem, ksize, umem, usize), 0,
> -		    "copy_struct_from_user(usize == ksize) failed");
> -	KUNIT_EXPECT_MEMEQ_MSG(test, kmem, expected, ksize,
> -		    "copy_struct_from_user(usize == ksize) gives unexpected copy");
> +	KUNIT_EXPECT_EQ_MSG(test,
> +			    copy_struct_from_user(kmem, ksize, umem, usize), 0,
> +			    "copy_struct_from_user(usize == ksize) failed");
> +	KUNIT_EXPECT_MEMEQ_MSG(
> +		test, kmem, expected, ksize,
> +		"copy_struct_from_user(usize == ksize) gives unexpected copy");
>   
>   	/* Old userspace case -- (usize < ksize). */
>   	ksize = size;
> @@ -143,18 +144,21 @@ static void usercopy_test_copy_struct_from_user(struct kunit *test)
>   	memset(expected + usize, 0x0, ksize - usize);
>   
>   	memset(kmem, 0x0, size);
> -	KUNIT_EXPECT_EQ_MSG(test, copy_struct_from_user(kmem, ksize, umem, usize), 0,
> -		    "copy_struct_from_user(usize < ksize) failed");
> -	KUNIT_EXPECT_MEMEQ_MSG(test, kmem, expected, ksize,
> -		    "copy_struct_from_user(usize < ksize) gives unexpected copy");
> +	KUNIT_EXPECT_EQ_MSG(test,
> +			    copy_struct_from_user(kmem, ksize, umem, usize), 0,
> +			    "copy_struct_from_user(usize < ksize) failed");
> +	KUNIT_EXPECT_MEMEQ_MSG(
> +		test, kmem, expected, ksize,
> +		"copy_struct_from_user(usize < ksize) gives unexpected copy");
>   
>   	/* New userspace (-E2BIG) case -- (usize > ksize). */
>   	ksize = size / 2;
>   	usize = size;
>   
>   	memset(kmem, 0x0, size);
> -	KUNIT_EXPECT_EQ_MSG(test, copy_struct_from_user(kmem, ksize, umem, usize), -E2BIG,
> -		    "copy_struct_from_user(usize > ksize) didn't give E2BIG");
> +	KUNIT_EXPECT_EQ_MSG(
> +		test, copy_struct_from_user(kmem, ksize, umem, usize), -E2BIG,
> +		"copy_struct_from_user(usize > ksize) didn't give E2BIG");
>   
>   	/* New userspace (success) case -- (usize > ksize). */
>   	ksize = size / 2;
> @@ -162,13 +166,15 @@ static void usercopy_test_copy_struct_from_user(struct kunit *test)
>   
>   	memcpy(expected, umem_src, ksize);
>   	KUNIT_EXPECT_EQ_MSG(test, clear_user(umem + ksize, usize - ksize), 0,
> -		    "legitimate clear_user failed");
> +			    "legitimate clear_user failed");
>   
>   	memset(kmem, 0x0, size);
> -	KUNIT_EXPECT_EQ_MSG(test, copy_struct_from_user(kmem, ksize, umem, usize), 0,
> -		    "copy_struct_from_user(usize > ksize) failed");
> -	KUNIT_EXPECT_MEMEQ_MSG(test, kmem, expected, ksize,
> -		    "copy_struct_from_user(usize > ksize) gives unexpected copy");
> +	KUNIT_EXPECT_EQ_MSG(test,
> +			    copy_struct_from_user(kmem, ksize, umem, usize), 0,
> +			    "copy_struct_from_user(usize > ksize) failed");
> +	KUNIT_EXPECT_MEMEQ_MSG(
> +		test, kmem, expected, ksize,
> +		"copy_struct_from_user(usize > ksize) gives unexpected copy");
>   }
>   
>   /*
> @@ -182,28 +188,29 @@ static void usercopy_test_valid(struct kunit *test)
>   
>   	memset(kmem, 0x3a, PAGE_SIZE * 2);
>   	KUNIT_EXPECT_EQ_MSG(test, 0, copy_to_user(usermem, kmem, PAGE_SIZE),
> -	     "legitimate copy_to_user failed");
> +			    "legitimate copy_to_user failed");
>   	memset(kmem, 0x0, PAGE_SIZE);
>   	KUNIT_EXPECT_EQ_MSG(test, 0, copy_from_user(kmem, usermem, PAGE_SIZE),
> -	     "legitimate copy_from_user failed");
> +			    "legitimate copy_from_user failed");
>   	KUNIT_EXPECT_MEMEQ_MSG(test, kmem, kmem + PAGE_SIZE, PAGE_SIZE,
> -	     "legitimate usercopy failed to copy data");
> -
> -#define test_legit(size, check)						\
> -	do {								\
> -		size val_##size = (check);				\
> -		KUNIT_EXPECT_EQ_MSG(test, 0,				\
> -			put_user(val_##size, (size __user *)usermem),	\
> -			"legitimate put_user (" #size ") failed");	\
> -		val_##size = 0;						\
> -		KUNIT_EXPECT_EQ_MSG(test, 0,				\
> -			get_user(val_##size, (size __user *)usermem),	\
> -			"legitimate get_user (" #size ") failed");	\
> -		KUNIT_EXPECT_EQ_MSG(test, val_##size, check,		\
> -			"legitimate get_user (" #size ") failed to do copy"); \
> +			       "legitimate usercopy failed to copy data");
> +
> +#define test_legit(size, check)                                                \
> +	do {                                                                   \
> +		size val_##size = (check);                                     \
> +		KUNIT_EXPECT_EQ_MSG(                                           \
> +			test, 0, put_user(val_##size, (size __user *)usermem), \
> +			"legitimate put_user (" #size ") failed");             \
> +		val_##size = 0;                                                \
> +		KUNIT_EXPECT_EQ_MSG(                                           \
> +			test, 0, get_user(val_##size, (size __user *)usermem), \
> +			"legitimate get_user (" #size ") failed");             \
> +		KUNIT_EXPECT_EQ_MSG(test, val_##size, check,                   \
> +				    "legitimate get_user (" #size              \
> +				    ") failed to do copy");                    \
>   	} while (0)
>   
> -	test_legit(u8,  0x5a);
> +	test_legit(u8, 0x5a);
>   	test_legit(u16, 0x5a5b);
>   	test_legit(u32, 0x5a5b5c5d);
>   #ifdef TEST_U64
> @@ -225,7 +232,9 @@ static void usercopy_test_invalid(struct kunit *test)
>   
>   	if (IS_ENABLED(CONFIG_ALTERNATE_USER_ADDRESS_SPACE) ||
>   	    !IS_ENABLED(CONFIG_MMU)) {
> -		kunit_skip(test, "Testing for kernel/userspace address confusion is only sensible on architectures with a shared address space");
> +		kunit_skip(
> +			test,
> +			"Testing for kernel/userspace address confusion is only sensible on architectures with a shared address space");
>   		return;
>   	}
>   
> @@ -234,13 +243,16 @@ static void usercopy_test_invalid(struct kunit *test)
>   	memset(kmem + PAGE_SIZE, 0, PAGE_SIZE);
>   
>   	/* Reject kernel-to-kernel copies through copy_from_user(). */
> -	KUNIT_EXPECT_NE_MSG(test, copy_from_user(kmem, (char __user *)(kmem + PAGE_SIZE),
> -						 PAGE_SIZE), 0,
> -		    "illegal all-kernel copy_from_user passed");
> +	KUNIT_EXPECT_NE_MSG(test,
> +			    copy_from_user(kmem,
> +					   (char __user *)(kmem + PAGE_SIZE),
> +					   PAGE_SIZE),
> +			    0, "illegal all-kernel copy_from_user passed");
>   
>   	/* Destination half of buffer should have been zeroed. */
> -	KUNIT_EXPECT_MEMEQ_MSG(test, kmem + PAGE_SIZE, kmem, PAGE_SIZE,
> -		    "zeroing failure for illegal all-kernel copy_from_user");
> +	KUNIT_EXPECT_MEMEQ_MSG(
> +		test, kmem + PAGE_SIZE, kmem, PAGE_SIZE,
> +		"zeroing failure for illegal all-kernel copy_from_user");
>   
>   #if 0
>   	/*
> @@ -253,31 +265,36 @@ static void usercopy_test_invalid(struct kunit *test)
>   						 PAGE_SIZE), 0,
>   		    "illegal reversed copy_from_user passed");
>   #endif
> -	KUNIT_EXPECT_NE_MSG(test, copy_to_user((char __user *)kmem, kmem + PAGE_SIZE,
> -					       PAGE_SIZE), 0,
> -		    "illegal all-kernel copy_to_user passed");
> -
> -	KUNIT_EXPECT_NE_MSG(test, copy_to_user((char __user *)kmem, bad_usermem,
> -					       PAGE_SIZE), 0,
> -		    "illegal reversed copy_to_user passed");
> -
> -#define test_illegal(size, check)							\
> -	do {										\
> -		size val_##size = (check);						\
> -		/* get_user() */							\
> -		KUNIT_EXPECT_NE_MSG(test, get_user(val_##size, (size __user *)kmem), 0,	\
> -		    "illegal get_user (" #size ") passed");				\
> -		KUNIT_EXPECT_EQ_MSG(test, val_##size, 0,				\
> -		    "zeroing failure for illegal get_user (" #size ")");		\
> -		/* put_user() */							\
> -		*kmem_u64 = 0xF09FA4AFF09FA4AF;						\
> -		KUNIT_EXPECT_NE_MSG(test, put_user(val_##size, (size __user *)kmem), 0,	\
> -		    "illegal put_user (" #size ") passed");				\
> -		KUNIT_EXPECT_EQ_MSG(test, *kmem_u64, 0xF09FA4AFF09FA4AF,		\
> -		    "illegal put_user (" #size ") wrote to kernel memory!");		\
> +	KUNIT_EXPECT_NE_MSG(test,
> +			    copy_to_user((char __user *)kmem, kmem + PAGE_SIZE,
> +					 PAGE_SIZE),
> +			    0, "illegal all-kernel copy_to_user passed");
> +
> +	KUNIT_EXPECT_NE_MSG(
> +		test, copy_to_user((char __user *)kmem, bad_usermem, PAGE_SIZE),
> +		0, "illegal reversed copy_to_user passed");
> +
> +#define test_illegal(size, check)                                              \
> +	do {                                                                   \
> +		size val_##size = (check);                                     \
> +		/* get_user() */                                               \
> +		KUNIT_EXPECT_NE_MSG(test,                                      \
> +				    get_user(val_##size, (size __user *)kmem), \
> +				    0, "illegal get_user (" #size ") passed"); \
> +		KUNIT_EXPECT_EQ_MSG(                                           \
> +			test, val_##size, 0,                                   \
> +			"zeroing failure for illegal get_user (" #size ")");   \
> +		/* put_user() */                                               \
> +		*kmem_u64 = 0xF09FA4AFF09FA4AF;                                \
> +		KUNIT_EXPECT_NE_MSG(test,                                      \
> +				    put_user(val_##size, (size __user *)kmem), \
> +				    0, "illegal put_user (" #size ") passed"); \
> +		KUNIT_EXPECT_EQ_MSG(test, *kmem_u64, 0xF09FA4AFF09FA4AF,       \
> +				    "illegal put_user (" #size                 \
> +				    ") wrote to kernel memory!");              \
>   	} while (0)
>   
> -	test_illegal(u8,  0x5a);
> +	test_illegal(u8, 0x5a);
>   	test_illegal(u16, 0x5a5b);
>   	test_illegal(u32, 0x5a5b5c5d);
>   #ifdef TEST_U64
> @@ -286,13 +303,136 @@ static void usercopy_test_invalid(struct kunit *test)
>   #undef test_illegal
>   }
>   
> +/* Test fault handling when copying from/to user mode */
> +static void usercopy_test_fault_handling(struct kunit *test)
> +{
> +	size_t start, len;
> +	struct usercopy_test_priv *priv = test->priv;
> +	const size_t size = 256;
> +	char __user *umem_gp = priv->umem + 2 * PAGE_SIZE;
> +	char __user *umem = umem_gp - size;
> +	char *kmem0 = priv->kmem;
> +	char *kmem1 = priv->kmem + size;
> +	const char fill_char = 0xff;
> +	const char override_char = 0xcc; /* cannot be 0 */
> +
> +	KUNIT_ASSERT_LT_MSG(test, size * 2, PAGE_SIZE,
> +			    "size * 2 is larger than PAGE_SIZE");
> +
> +	/* Copy to the guard page should fail with no byte copied */
> +	for (len = 1; len < size; len++) {
> +		KUNIT_ASSERT_EQ_MSG(
> +			test, copy_to_user(umem_gp, kmem1, len), len,
> +			"copy_to_user copied more than 1 byte to guard page");
> +	}
> +
> +	for (start = size - 1; start != 0; start--) {
> +		for (len = size - start + 1; len <= size; len++) {
> +			memset(kmem1, fill_char, size);
> +			KUNIT_EXPECT_EQ_MSG(test,
> +					    copy_to_user(umem, kmem1, size), 0,
> +					    "legitimate copy_to_user failed");
> +			memset(kmem1 + start, override_char, len);
> +
> +			/*
> +			 * This copy_to_user should partially fail with retval containing the
> +			 * number of bytes not copied
> +			 */
> +			unsigned long retval =
> +				copy_to_user(umem + start, kmem1 + start, len);
> +
> +			KUNIT_EXPECT_NE_MSG(
> +				test, retval, 0,
> +				"copy_to_user should not copy all the bytes (start=%zu, len=%zu)",
> +				start, len);
> +			KUNIT_EXPECT_LE_MSG(
> +				test, retval, len - 1,
> +				"copy_to_user should at least copy 1 byte (start=%zu, len=%zu)",
> +				start, len);
> +
> +			/* copy the umem page to kernel to check */
> +			KUNIT_EXPECT_EQ_MSG(test,
> +					    copy_from_user(kmem0, umem, size),
> +					    0,
> +					    "legitimate copy_to_user failed");
> +
> +			char *tmp =
> +				memchr_inv(kmem0 + start, override_char, len);
> +
> +			KUNIT_EXPECT_TRUE_MSG(
> +				test, tmp,
> +				"memchr_inv returned NULL (start=%zu, len=%zu)",
> +				start, len);
> +
> +			unsigned long expected = len - (tmp - (kmem0 + start));
> +
> +			KUNIT_EXPECT_EQ_MSG(
> +				test, retval, expected,
> +				"copy_to_user(=%zu) != memchr_inv(=%zu) mismatch (start=%zu, len=%zu)",
> +				retval, expected, start, len);
> +		}
> +	}
> +
> +	for (len = 1; len < size; len++) {
> +		/* Copy from the guard page should fail immediately */
> +		KUNIT_ASSERT_EQ_MSG(
> +			test, copy_from_user(kmem0, umem_gp, len), len,
> +			"copy_from_user copied more than 1 byte to guard page");
> +	}
> +
> +	for (start = size - 1; start != 0; start--) {
> +		for (len = size - start + 1; len <= size; len++) {
> +			memset(kmem0, override_char, size);
> +			KUNIT_EXPECT_EQ_MSG(test,
> +					    copy_to_user(umem, kmem0, size), 0,
> +					    "legitimate copy_to_user failed");
> +			memset(kmem0 + start, fill_char, len);
> +
> +			/*
> +			 * This copy_from_user should partially fail with retval containing
> +			 * the number of bytes not copied
> +			 */
> +			unsigned long retval = copy_from_user(
> +				kmem0 + start, umem + start, len);
> +
> +			KUNIT_EXPECT_NE_MSG(
> +				test, retval, 0,
> +				"copy_from_user should not copy all the bytes (start=%zu, len=%zu)",
> +				start, len);
> +			KUNIT_EXPECT_LE_MSG(
> +				test, retval, len - 1,
> +				"copy_from_user should at least copy 1 byte (start=%zu, len=%zu)",
> +				start, len);
> +
> +			char *tmp =
> +				memchr_inv(kmem0 + start, override_char, len);
> +
> +			KUNIT_EXPECT_TRUE_MSG(
> +				test, tmp,
> +				"memchr_inv returned NULL (start=%zu, len=%zu)",
> +				start, len);
> +
> +			unsigned long expected = len - (tmp - (kmem0 + start));
> +
> +			KUNIT_EXPECT_EQ_MSG(
> +				test, retval, expected,
> +				"copy_from_user(=%zu) != memchr_inv(=%zu) mismatch (start=%zu, len=%zu)",
> +				retval, expected, start, len);
> +		}
> +	}
> +}
> +
>   static int usercopy_test_init(struct kunit *test)
>   {
>   	struct usercopy_test_priv *priv;
>   	unsigned long user_addr;
> +	int ret;
> +	size_t total_size;
>   
>   	if (!IS_ENABLED(CONFIG_MMU)) {
> -		kunit_skip(test, "Userspace allocation testing not available on non-MMU systems");
> +		kunit_skip(
> +			test,
> +			"Userspace allocation testing not available on non-MMU systems");
>   		return 0;
>   	}
>   
> @@ -304,13 +444,19 @@ static int usercopy_test_init(struct kunit *test)
>   	priv->kmem = kunit_kmalloc(test, priv->size, GFP_KERNEL);
>   	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, priv->kmem);
>   
> -	user_addr = kunit_vm_mmap(test, NULL, 0, priv->size,
> -			    PROT_READ | PROT_WRITE | PROT_EXEC,
> -			    MAP_ANONYMOUS | MAP_PRIVATE, 0);
> +	/* add an extra guard page */
> +	total_size = priv->size + PAGE_SIZE;
> +	user_addr = kunit_vm_mmap(test, NULL, 0, total_size,
> +				  PROT_READ | PROT_WRITE,
> +				  MAP_ANONYMOUS | MAP_PRIVATE, 0);
>   	KUNIT_ASSERT_NE_MSG(test, user_addr, 0,
> -		"Could not create userspace mm");
> +			    "Could not create userspace mm");
>   	KUNIT_ASSERT_LT_MSG(test, user_addr, (unsigned long)TASK_SIZE,
> -		"Failed to allocate user memory");
> +			    "Failed to allocate user memory");
> +
> +	ret = vm_munmap(user_addr + priv->size, PAGE_SIZE);
> +	KUNIT_ASSERT_EQ_MSG(test, ret, 0, "Failed to unmap guard page");
> +
>   	priv->umem = (char __user *)user_addr;
>   
>   	return 0;
> @@ -321,6 +467,7 @@ static struct kunit_case usercopy_test_cases[] = {
>   	KUNIT_CASE(usercopy_test_invalid),
>   	KUNIT_CASE(usercopy_test_check_nonzero_user),
>   	KUNIT_CASE(usercopy_test_copy_struct_from_user),
> +	KUNIT_CASE(usercopy_test_fault_handling),
>   	{}
>   };
>   


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ