linux-kernel - Re: [PATCH] Alpha: Emulate unaligned LDx_L/STx

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.2502202106200.65342@angie.orcam.me.uk>
Date: Thu, 20 Feb 2025 22:09:20 +0000 (GMT)
From: "Maciej W. Rozycki" <macro@...am.me.uk>
To: Linus Torvalds <torvalds@...ux-foundation.org>
cc: Richard Henderson <richard.henderson@...aro.org>, 
    Ivan Kokshaysky <ink@...een.parts>, Matt Turner <mattst88@...il.com>, 
    Arnd Bergmann <arnd@...db.de>, 
    John Paul Adrian Glaubitz <glaubitz@...sik.fu-berlin.de>, 
    Magnus Lindholm <linmag7@...il.com>, 
    "Paul E. McKenney" <paulmck@...nel.org>, Al Viro <viro@...iv.linux.org.uk>, 
    linux-alpha@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] Alpha: Emulate unaligned LDx_L/STx_C for data
 consistency

On Wed, 19 Feb 2025, Linus Torvalds wrote:

> > 1. A trapping unaligned LDx_L operation results in the pair of adjacent
> >    aligned whole data quantities spanned being read and stored for the
> >    reference with a subsequent STx_C operation, along with the width of
> >    the data accessed and its virtual address, and the task referring or
> >    NULL if the kernel.  The validity marker is set.
> 
> So I have a couple of comments. I don't care deeply because I do think
> alpha is dead, but for completeness:
> 
>  (a) I don't think the task checking is correct.
> 
> You're only checking the task pointer, so what can happen is that a
> task exits and another starts up with the same task pointer value, and
> it all matches across one task doing a ld_l and another doing a st_c.
> 
> Does this matter? No. You'd literally have to *try* to create that
> situation with identical mis-aligned addresses and data contents, and
> an exit after a 'ld_l', and doing a 'st_c' in the new task without the
> matching ld_l.
> 
> So I suspect this works in practice, but it's still worth mentioning.

 It's an interesting corner case, but executing STx_C without preceding 
matching LDx_L is hardly useful and depending on the circumstances may or 
may not cause unpredictable results.  We can make it a part of the psABI 
that such usage is not supported.  Otherwise we could clear `ll_bit' in 
a task's termination path.

>  (b) this is not truly atomic wrt concurrent aligned non-trapping
> operations to the same words. Or in fact to current trapping ones,
> since you end up inevitably releasing the spinlock before the final
> stc emulation.

 It is not supposed to be.  The only objective of this code is to protect 
the *unchanged* part of the longword/quadword.

> I think this is fundamental and non-fixable, because the stc is done
> as two operations, and the first can succeed with the second failing
> (or they can both succeed, just interleaved with other accesses).

 It is absolutely fine and by design.  Atomicity of the unaligned store of 
the quantity itself is not guaranteed by this sequence, just as it is not 
with the original one using a pair of STQ_U operations, where a concurrent 
write to the same location may cause the parts of the quantity to become 
inconsistent with each other.

 If the trapping code wanted to make the data quantity accessed atomic, it 
should have declared it atomic as well as made it aligned, in which case 
the compiler would have done the right thing, including padding/separation 
from other data objects where necessary for the minimum width of data that 
hardware can guarantee atomicity for.  And then we would not have arrived 
in this emulation path in the first place.

> Again, I don't think we care, and it works in practice, but it does
> mean that I *really* think that:
> 
>  (c) you should not handle the kernel case at all.
> 
> If the kernel does an unaligned ld_l/st_c, that's a fundamental kernel
> bug. Don't emulate it. Particularly when the emulation fundamentally
> is not truly atomic wrt other accesses.

 Good point actually, I think I mentally drove myself into a dead end 
here.  Yes, absolutely, it is not expected to happen unless we have a bug 
in our code somewhere!

> Finally:
> 
>  (d) I think you're doing a few too many inline asms by hand, and
> you're masking the results too much.
> 
> On the read-side emulation, why do you do that
> 
> +               "1:     ldl %3,0(%5)\n"
> +               "2:     ldl %4,4(%5)\n"
> +               "       srl %3,%6,%1\n"
> +               "       sll %4,%7,%2\n"
> +               "       zapnot %1,15,%1\n"
> +               "       zapnot %2,15,%2\n"
> 
> at all? Just do two aligned loads, and don't mask the bytes around
> them. A *real* ldl/stc will fail not just when the exact bytes are
> different, but when somebody has touched the same cacheline. So if the
> aligned values have changed, you should fail the stc even if the
> change was in other bytes.

 We do need to extract the bytes desired for the result of LDx_L though.

 Yes, it can be done in C, but the same stands for all the other emulation
pieces here and yet they use inline assembly.  GCC is happy to do byte 
extraction itself where necessary, it has machine description patterns for 
that as it a fairly common operation (it does not for INSxH and MSKxH 
though, they're handled as intrinsics only, which however we could use 
instead).

 I think there is value in consistency, and with this piece written as 
inline assembly you can spot the difference from the other variants right 
away.  Or I could rewrite the byte extraction in C across other patterns.

> And doing two aligned loads don't need any inline asm at all.

 Neither does unaligned loads, as GCC is happy to emit LDQ_U itself where 
necessary, but we want to catch exceptions or we'd get an oops rather than 
SIGSEGV.

> On the st_c side, I think you're repeating the same inline asm twice,
> and should have a single helper.

 The masks are different as are displacements, but a good point otherwise.  
I think this guarantees the prologue to the atomic loop to go away, which 
is a good thing, but I'll yet double-check the quality of code produced.

> Is this a NAK for the patch? No. But I do think it should be massaged a bit.

 Thank you for your review, I'll post v2 once I'm done with the massage.

  Maciej