linux-kernel - Re: [PATCH] sh: Remove IO memcpy and memset from sh code

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <ffe019a1-11b4-4ad7-bbe2-8ef3e01ffeb0@app.fastmail.com>
Date: Tue, 28 Jan 2025 10:25:18 +0100
From: "Arnd Bergmann" <arnd@...db.de>
To: "Julian Vetter" <julian@...er-limits.org>,
 "Yoshinori Sato" <ysato@...rs.sourceforge.jp>,
 "Rich Felker" <dalias@...c.org>,
 "John Paul Adrian Glaubitz" <glaubitz@...sik.fu-berlin.de>
Cc: linux-sh@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sh: Remove IO memcpy and memset from sh code

On Tue, Jan 28, 2025, at 09:42, Julian Vetter wrote:
> Remove IO memcpy and memset from sh specific code and fall back to the
> new implementation from lib/iomem_copy.c. It uses word accesses if the
> buffers are aligned and only falls back to byte accesses for potentially
> unaligned parts of a buffer. Keep only the SH4 optimized memcpy_fromio.
>
> Signed-off-by: Julian Vetter <julian@...er-limits.org>

This looks good in pinciple, but I see one mistake:

> +#ifdef CONFIG_CPU_SH4
> +void memcpy_fromio(void *to, const volatile void __iomem *from, size_t 
> count)
>  {
>  	/*
>  	 * Would it be worthwhile doing byte and long transfers first
>  	 * to try and get aligned?
>  	 */
> -#ifdef CONFIG_CPU_SH4
>  	if ((count >= 0x20) &&
>  	     (((u32)to & 0x1f) == 0) && (((u32)from & 0x3) == 0)) {
>  		int tmp2, tmp3, tmp4, tmp5, tmp6;
> @@ -53,59 +50,6 @@ void memcpy_fromio(void *to, const volatile void 
> __iomem *from, unsigned long co
>  			: "7"(from), "0" (to), "1" (count)
>  			: "r0", "r7", "t", "memory");
>  	}
> -#endif
> -
> -	if ((((u32)to | (u32)from) & 0x3) == 0) {
> -		for (; count > 3; count -= 4) {
> -			*(u32 *)to = *(volatile u32 *)from;
> -			to += 4;
> -			from += 4;
> -		}
> -	}
> -

The SH4 version still needs the bottom of the function to
handle data that is not a multiple of 32 bytes long.

I would expect gcc to produce a properly optimized
version for sh4 from the generic code as well, so I would
suggest you remove it entirely and rely on the common code
here.

     Arnd