[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFULd4b91Tr9Q2p4a20eusC+QO6O81gxY+nP-zpFiFKGTmLpYg@mail.gmail.com>
Date: Thu, 19 Oct 2023 19:21:22 +0200
From: Uros Bizjak <ubizjak@...il.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: peterz@...radead.org, Nadav Amit <namit@...are.com>,
"the arch/x86 maintainers" <x86@...nel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Andy Lutomirski <luto@...nel.org>,
Brian Gerst <brgerst@...il.com>,
Denys Vlasenko <dvlasenk@...hat.com>,
"H . Peter Anvin" <hpa@...or.com>,
Thomas Gleixner <tglx@...utronix.de>,
Josh Poimboeuf <jpoimboe@...hat.com>,
Nick Desaulniers <ndesaulniers@...gle.com>
Subject: Re: [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu_ptr()
On Thu, Oct 19, 2023 at 7:00 PM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> On Thu, 19 Oct 2023 at 00:04, Uros Bizjak <ubizjak@...il.com> wrote:
> >
> > Let me explain how the compiler handles volatile.
>
> We're talking past each other.
>
> You are talking about the volatile *memory* ops, and the the
> difference that "raw" vs "this" would cause with and without the
> "volatile".
>
> While *I* am now convinced that the memory ops aren't even an option,
> because they will generate worse code, because pretty much all users
> use the "this" version (which would have to use volatile),
Please see [1]. Even with volatile access, with memory ops the
compiler can propagate operands, resulting in ~8k code size reduction,
and many hundreds (if not thousands) MOVs propagated into subsequent
instructions. Please note many code examples in [1]. This is not
possible with the asm variant.
[1] https://lore.kernel.org/lkml/20231004192404.31733-1-ubizjak@gmail.com/
> Because if we just stick with inline asms, the need for "volatile"
> simply goes away.
No, the compiler is then free to remove or duplicate the asm (plus
other unwanted optimizations), please see the end of chapter 6.47.2.1
in [2].
[2] https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/Extended-Asm.html#Volatile-1
> The existing volatile on those percpu inline asms is *wrong*. It's a
> historical mistake.
Please see above.
> And with just a plain non-volatile inline asm, the inline asm wins.
Please see [1] for the code propagation argument.
> It doesn't have the (bad) read-once behavior of a volatile memory op.
>
> And it also doesn't have the (horrible correctness issue)
> rematerialization behavior of a non-volatile memory op.
Unfortunately, it does. Without volatile, asm can be rematerialized in
the same way as it can be CSEd. OTOH, the memory op with memory-ops
approach is casted to volatile in this_* case, so it for sure won't
get rematerialized.
> A compiler that were to rematerializes an inline asm (instead of
> spilling) would be a bad joke. That's not an optimization, that's just
> a crazy bad compiler with a code generation bug.
But that is what the compiler does without volatile.
Thanks,
Uros.
Powered by blists - more mailing lists