linux-kernel - Re: [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wgCPbkf0Kdi=4T3LAVvNEH0jxJBWcTiBkrFDBsxkC9mKQ@mail.gmail.com>
Date:   Wed, 18 Oct 2023 13:34:23 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Uros Bizjak <ubizjak@...il.com>
Cc:     Nadav Amit <namit@...are.com>,
        "the arch/x86 maintainers" <x86@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Andy Lutomirski <luto@...nel.org>,
        Brian Gerst <brgerst@...il.com>,
        Denys Vlasenko <dvlasenk@...hat.com>,
        "H . Peter Anvin" <hpa@...or.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Nick Desaulniers <ndesaulniers@...gle.com>
Subject: Re: [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu_ptr()

On Wed, 18 Oct 2023 at 13:22, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> And yes, sometimes we use actual volatile accesses for them
> (READ_ONCE() and WRITE_ONCE()) but those are *horrendous* in general,
> and are much too strict. Not only does gcc generally lose its mind
> when it sees volatile (ie it stops doing various sane combinations
> that would actually be perfectly valid), but it obviously also stops
> doing CSE on the loads (as it has to).

Note, in case you wonder what I mean by "lose its mind", try this
(extremely stupid) test program:

    void a(volatile int *i) { ++*i; }
    void b(int *i) { ++*i; }

and note that the non-volatile version does

    addl $1, (%rdi)

but the volatile version then refuses to combine the read+write into a
rmw instruction, and generates

    movl (%rdi), %eax
    addl $1, %eax
    movl %eax, (%rdi)

instead.

Sure, it's correct, but it's an example of how 'volatile' ends up
disabling a lot of other optimizations than just the "don't remove the
access".

Doing the volatile as one rmw instruction would still have been very
obviously valid - it's still doing a read and a write. You don't need
two instructions for that.

I'm not complaining, and I understand *why* it happens - compiler
writers very understandably go "oh, I'm not touching that".

I'm just trying to point out that volatile really screws up code
generation even aside from the "access _exactly_ once" issue.

So using inline asm and relying on gcc doing (minimal) CSE will then
generate better code than volatile ever could, even when we just use a
simple 'mov" instruction. At least you get that basic combining
effect, even if it's not great.

And for memory ops, *not* using volatile is dangerous when they aren't stable.

              Linus