linux-kernel - Re: [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAFULd4b91Tr9Q2p4a20eusC+QO6O81gxY+nP-zpFiFKGTmLpYg@mail.gmail.com>
Date:   Thu, 19 Oct 2023 19:21:22 +0200
From:   Uros Bizjak <ubizjak@...il.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     peterz@...radead.org, Nadav Amit <namit@...are.com>,
        "the arch/x86 maintainers" <x86@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Andy Lutomirski <luto@...nel.org>,
        Brian Gerst <brgerst@...il.com>,
        Denys Vlasenko <dvlasenk@...hat.com>,
        "H . Peter Anvin" <hpa@...or.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Nick Desaulniers <ndesaulniers@...gle.com>
Subject: Re: [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu_ptr()

On Thu, Oct 19, 2023 at 7:00 PM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> On Thu, 19 Oct 2023 at 00:04, Uros Bizjak <ubizjak@...il.com> wrote:
> >
> > Let me explain how the compiler handles volatile.
>
> We're talking past each other.
>
> You are talking about the volatile *memory* ops, and the the
> difference that "raw" vs "this" would cause with and without the
> "volatile".
>
> While *I* am now convinced that the memory ops aren't even an option,
> because they will generate worse code, because pretty much all users
> use the "this" version (which would have to use volatile),

Please see [1]. Even with volatile access, with memory ops the
compiler can propagate operands, resulting in ~8k code size reduction,
and many hundreds (if not thousands) MOVs propagated into subsequent
instructions. Please note many code examples in [1]. This is not
possible with the asm variant.

[1] https://lore.kernel.org/lkml/20231004192404.31733-1-ubizjak@gmail.com/

> Because if we just stick with inline asms, the need for "volatile"
> simply goes away.

No, the compiler is then free to remove or duplicate the asm (plus
other unwanted optimizations), please see the end of chapter 6.47.2.1
in [2].

[2] https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/Extended-Asm.html#Volatile-1

> The existing volatile on those percpu inline asms is *wrong*. It's a
> historical mistake.

Please see above.

> And with just a plain non-volatile inline asm, the inline asm wins.

Please see [1] for the code propagation argument.

> It doesn't have the (bad) read-once behavior of a volatile memory op.
>
> And it also doesn't have the (horrible correctness issue)
> rematerialization behavior of a non-volatile memory op.

Unfortunately, it does. Without volatile, asm can be rematerialized in
the same way as it can be CSEd. OTOH, the memory op with memory-ops
approach is casted to volatile in this_* case, so it for sure won't
get rematerialized.

> A compiler that were to rematerializes an inline asm (instead of
> spilling) would be a bad joke. That's not an optimization, that's just
> a crazy bad compiler with a code generation bug.

But that is what the compiler does without volatile.

Thanks,
Uros.