[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wgCPbkf0Kdi=4T3LAVvNEH0jxJBWcTiBkrFDBsxkC9mKQ@mail.gmail.com>
Date: Wed, 18 Oct 2023 13:34:23 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Uros Bizjak <ubizjak@...il.com>
Cc: Nadav Amit <namit@...are.com>,
"the arch/x86 maintainers" <x86@...nel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Andy Lutomirski <luto@...nel.org>,
Brian Gerst <brgerst@...il.com>,
Denys Vlasenko <dvlasenk@...hat.com>,
"H . Peter Anvin" <hpa@...or.com>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Josh Poimboeuf <jpoimboe@...hat.com>,
Nick Desaulniers <ndesaulniers@...gle.com>
Subject: Re: [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu_ptr()
On Wed, 18 Oct 2023 at 13:22, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> And yes, sometimes we use actual volatile accesses for them
> (READ_ONCE() and WRITE_ONCE()) but those are *horrendous* in general,
> and are much too strict. Not only does gcc generally lose its mind
> when it sees volatile (ie it stops doing various sane combinations
> that would actually be perfectly valid), but it obviously also stops
> doing CSE on the loads (as it has to).
Note, in case you wonder what I mean by "lose its mind", try this
(extremely stupid) test program:
void a(volatile int *i) { ++*i; }
void b(int *i) { ++*i; }
and note that the non-volatile version does
addl $1, (%rdi)
but the volatile version then refuses to combine the read+write into a
rmw instruction, and generates
movl (%rdi), %eax
addl $1, %eax
movl %eax, (%rdi)
instead.
Sure, it's correct, but it's an example of how 'volatile' ends up
disabling a lot of other optimizations than just the "don't remove the
access".
Doing the volatile as one rmw instruction would still have been very
obviously valid - it's still doing a read and a write. You don't need
two instructions for that.
I'm not complaining, and I understand *why* it happens - compiler
writers very understandably go "oh, I'm not touching that".
I'm just trying to point out that volatile really screws up code
generation even aside from the "access _exactly_ once" issue.
So using inline asm and relying on gcc doing (minimal) CSE will then
generate better code than volatile ever could, even when we just use a
simple 'mov" instruction. At least you get that basic combining
effect, even if it's not great.
And for memory ops, *not* using volatile is dangerous when they aren't stable.
Linus
Powered by blists - more mailing lists