[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.2502202009470.65342@angie.orcam.me.uk>
Date: Thu, 20 Feb 2025 21:05:54 +0000 (GMT)
From: "Maciej W. Rozycki" <macro@...am.me.uk>
To: Richard Henderson <richard.henderson@...aro.org>
cc: Ivan Kokshaysky <ink@...een.parts>, Matt Turner <mattst88@...il.com>,
Arnd Bergmann <arnd@...db.de>,
John Paul Adrian Glaubitz <glaubitz@...sik.fu-berlin.de>,
Magnus Lindholm <linmag7@...il.com>,
"Paul E. McKenney" <paulmck@...nel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Al Viro <viro@...iv.linux.org.uk>, linux-alpha@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] Alpha: Emulate unaligned LDx_L/STx_C for data
consistency
On Thu, 20 Feb 2025, Richard Henderson wrote:
> > Complementing compiler support for the `-msafe-bwa' and `-msafe-partial'
> > code generation options slated to land in GCC 15,
>
> Pointer? I can't find it on the gcc-patches list.
Here:
<https://inbox.sourceware.org/gcc-patches/alpine.DEB.2.21.2501050246590.49841@angie.orcam.me.uk/>
and hopefully in your inbox/archive somewhere as well.
> > 7. At this point both whole data quantities have been written, ensuring
> > that no third-party intervening write has changed them at the point
> > of the write from the values held at previous LDx_L. Therefore 1 is
> > returned in the intended register as the result of the trapping STx_C
> > instruction.
>
> I think general-purpose non-atomic emulation of STx_C is a really bad idea.
>
> Without looking at your gcc patches, I can guess what you're after: you've
> generated a ll/sc sequence for (aligned) short, and want to emulate if it
> happens to be unaligned.
It's a corner case, yes, when the compiler was told the access would be
aligned, but it turns out not. It's where you cast a (char *) pointer to
(short *) that wasn't suitably aligned for such a cast and dereference it
(and the quadword case is similarly for the ends of misaligned inline
`memcpy'/`memset').
Only two cases (plus a bug in GM2 frontend) hitting this throughout the
GCC testsuite show the rarity of this case.
> Crucially, when emulating non-aligned, you should not strive to make it
> atomic. No other architecture promises atomic non-aligned stores, so why
> should you do that here?
This code doesn't strive to be atomic, but to preserve data *outside* the
quantity accessed from being clobbered, and for this purpose an atomic
sequence is both inevitable and sufficient, for both partial quantities
around the unaligned quantity written. The trapping code does not expect
atomicity for the unaligned quantity itself -- it is handled in pieces
just as say with MIPS SWL/SWR masked store instruction pairs -- and this
code, effectively an Alpha/Linux psABI extension, does not guarantee it
either.
> I suggest some sort of magic code sequence,
>
> bic addr_in, 6, addr_al
> loop:
> ldq_l t0, 0(addr_al)
> magic-nop done - loop
> inswl data, addr_in, t1
> mskwl t0, addr_in, t0
> bis t0, t1, t0
> stq_c t0, 0(addr_al)
> beq t0, loop
> done:
>
> With the trap, match the magic-nop, pick out the input registers from the
> following inswl, perform the two (atomic!) byte stores to accomplish the
> emulation, adjust the pc forward to the done label.
It seems to make no sense to me to penalise all user code for the corner
case mentioned above while still having the emulation in the kernel, given
that 99.999...% of accesses will have been correctly aligned by GCC. And
it gets even more complex when you have an awkward number of bytes to
mask, such as 3, 5, 6, 7, which will happen for example if inline `memcpy'
is expanded by GCC for a quadword-aligned block of 31-bytes, in which case
other instructions will be used for masking/insertion for the trailing 7
bytes, and the block turns out misaligned at run time.
I'm inconvinced, it seems a lot of hassle for little gain to me.
Maciej
Powered by blists - more mailing lists