[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANpmjNP6qRm0M9Za-=Lm8N87o-GukO5nQeAVpW=bbXRnip4Fvg@mail.gmail.com>
Date: Tue, 27 Jan 2026 16:04:45 +0100
From: Marco Elver <elver@...gle.com>
To: David Laight <david.laight.linux@...il.com>
Cc: Arnd Bergmann <arnd@...db.de>, Peter Zijlstra <peterz@...radead.org>, Will Deacon <will@...nel.org>,
Ingo Molnar <mingo@...nel.org>, Thomas Gleixner <tglx@...utronix.de>, Boqun Feng <boqun.feng@...il.com>,
Waiman Long <longman@...hat.com>, Bart Van Assche <bvanassche@....org>, llvm@...ts.linux.dev,
Catalin Marinas <catalin.marinas@....com>, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/3] arm64: Optimize __READ_ONCE() with CONFIG_LTO=y
On Tue, 27 Jan 2026 at 15:30, David Laight <david.laight.linux@...il.com> wrote:
>
> On Tue, 27 Jan 2026 13:01:22 +0100
> Marco Elver <elver@...gle.com> wrote:
>
> > On Mon, 26 Jan 2026 at 23:24, Arnd Bergmann <arnd@...db.de> wrote:
> > >
> > > On Mon, Jan 26, 2026, at 20:54, Marco Elver wrote:
> > > > On Mon, Jan 26, 2026 at 08:56AM +0100, Arnd Bergmann wrote:
> > > >> On Mon, Jan 26, 2026, at 01:25, Marco Elver wrote:
> > > >>
> > > >> How does this work with CC_HAS_TYPEOF_UNQUAL=false?
> > > >>
> > > >> As far as I can tell, TYPEOF_UNQUAL() falls back to __typeof__
> > > >> on gcc-13, clang-18 and earlier, and not strip out qualifiers.
> > > >
> > > > I think we only need to worry about Clang for LTO builds. But yeah, our
> > > > minimum supported Clang is 15, so between 15-18 it'd be broken.
> > >
> > > Right, I missed the #ifdef CONFIG_LTO check, so indeed gcc is
> > > fine here.
> > >
> > > >> With fd69b2f7d5f4 ("compiler: Use __typeof_unqual__() for
> > > >> __unqual_scalar_typeof()"), I would expect __unqual_scalar_typeof()
> > > >> to do the right thing already.
> > > >
> > > > It'd still be broken for Clang 15-18, so it won't help much. We need
> > > > this to work for more than "scalar", so even though it'll work for Clang
> > > > 19+ given the redefinition to __typeof_unqual__, we should deprecate the
> > > > _Generic-based __unqual_scalar_typeof() sooner than later.
> > > >
> > > > I was able to make this work for older compilers:
> > > >
> > > ...
> > > > #define __READ_ONCE(x) \
> > > > ({ \
> > > > auto __x = &(x); \
> > > > - auto __ret = (TYPEOF_UNQUAL(*__x) *)__x, *__retp = &__ret; \
> > > > - union { TYPEOF_UNQUAL(*__x) __val; char __c[1]; } __u; \
> > > > + auto __ret = (__read_once_typeof(*__x) *)__x, *__retp = &__ret; \
> > > > + union { __read_once_typeof(*__x) __val; char __c[1]; } __u; \
> > > > *__retp = &__u.__val; \
> > >
> > > >
> > > > Thoughts?
> > >
> > > Looks better than __unqual_scalar_typeof() to me. Would it make
> > > sense to do the same __read_once_typeof() in the asm-generic
> > > version of __READ_ONCE()? I don't remember if we discussed it
> > > in the thread leading up to dee081bf8f82 ("READ_ONCE: Drop
> > > pointer qualifiers when reading from scalar types").
> > > We probably didn't have __auto_type back then.
> >
> > I don't see the point for the asm-generic __READ_ONCE(): it's not
> > wrong to cast to 'const volatile volatile T*' nor 'const volatile
> > const T*' etc., which is dereferenced directly and not stored in any
> > temporary variable when used in READ_ONCE(). I actually don't know why
> > __unqual_scalar_typeof() is used in the asm-generic __READ_ONCE(),
> > which just adds 'const volatile' right back.
>
> That looks historic, once upon a time the code was:
> typeof(x) __x = __READ_ONCE(x);
> smp_read_barrier_depends();
> __x;
> but const needed stripping and someone decided to cast the result back
> to the original type.
I think the bigger issue was 'volatile': "Passing a volatile-qualified
pointer to READ_ONCE() is an absolute
trainwreck for code generation [...]" per dee081bf8f82.
While the above with the temporary is gone for the asm-generic
version, the arm64 LTO still has the temporary and we need some
qualifier-stripping helper.
And a similar pattern also still exists for the asm-generic variant of
__smp_load_acquire() and friends.
> Of course that is pointless since the qualifiers aren't relevant on an
> 'rvalue' so are then discarded.
> Then the barrier and temporary variable got removed.
> (The barrier was only needed for alpha - which has its own __READ_ONCE()).
>
> In the arm LTO version the ?: will also remove the qualifiers.
>
> > And __READ_ONCE_SCALAR() appears to be gone, where the
> > qualifier-stripping resulted in better code-gen.
>
> The qualifier stripping was needed to read a 'const' variable.
> Looks to me like the 'better code gen' happened earlier.
> The 'earlier' version took the address of the on-stack volatile
> variable.
>
> So you may not need to remove the const/volatile qualifiers at all.
Yeah, I think so too (for the asm-generic variant that is). But I
won't mix up the arm64 fixes with possible asm-generic ones to avoid
this blowing up too much. I don't see a dependency on the asm-generic
cleanup, so we can clean it up separately; but I think it's purely
cosmetic unlike this arm64 LTO version.
Powered by blists - more mailing lists