linux-kernel - Re: [PATCH 0/2] apparmor unaligned memory fixes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20251126142201.27e23076@pumpkin>
Date: Wed, 26 Nov 2025 14:22:01 +0000
From: david laight <david.laight@...box.com>
To: Helge Deller <deller@....de>
Cc: John Johansen <john.johansen@...onical.com>, Helge Deller
 <deller@...nel.org>, John Paul Adrian Glaubitz
 <glaubitz@...sik.fu-berlin.de>, linux-kernel@...r.kernel.org,
 apparmor@...ts.ubuntu.com, linux-security-module@...r.kernel.org,
 linux-parisc@...r.kernel.org
Subject: Re: [PATCH 0/2] apparmor unaligned memory fixes

On Wed, 26 Nov 2025 12:03:03 +0100
Helge Deller <deller@....de> wrote:

> On 11/26/25 11:44, david laight wrote:
...   
> >> diff --git a/security/apparmor/match.c b/security/apparmor/match.c
> >> index 26e82ba879d44..3dcc342337aca 100644
> >> --- a/security/apparmor/match.c
> >> +++ b/security/apparmor/match.c
> >> @@ -71,10 +71,10 @@ static struct table_header *unpack_table(char *blob, size_t bsize)
> >>    				     u8, u8, byte_to_byte);  
> > 
> > Is that that just memcpy() ?  
> 
> No, it's memcpy() only on big-endian machines.

You've misread the quoting...
The 'data8' case that was only half there is a memcpy().

> On little-endian machines it converts from big-endian
> 16/32-bit ints to little-endian 16/32-bit ints.
> 
> But I see some potential for optimization here:
> a) on big-endian machines just use memcpy()

true

> b) on little-endian machines use memcpy() to copy from possibly-unaligned
>     memory to then known-to-be-aligned destination. Then use a loop with
>     be32_to_cpu() instead of get_unaligned_xx() as it's faster.

There is a function that does a loop byteswap of a buffer - no reason
to re-invent it.
But I doubt it is always (if ever) faster to do a copy and then byteswap.
The loop control and extra memory accesses kill performance.

Not that I've seen a fast get_unaligned() - I don't think gcc or clang
generate optimal code - For LE I think it is something like:
	low = *(addr & ~3);
	high = *((addr + 3) & ~3);
	shift = (addr & 3) * 8;
	value = low << shift | high >> (32 - shift);
Note that it is only 2 aligned memory reads - even for 64bit.

	David