lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAHk-=wiJ6QCbrkwxcPr1MJPy=iMv7-_W-wnHgw2_gJf5ufUzaQ@mail.gmail.com>
Date:   Sat, 7 Oct 2023 11:07:08 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Uros Bizjak <ubizjak@...il.com>
Cc:     "H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...nel.org>,
        Brian Gerst <brgerst@...il.com>, linux-kernel@...r.kernel.org,
        x86@...nel.org, Thomas Gleixner <tglx@...utronix.de>,
        Borislav Petkov <bp@...en8.de>,
        Andy Lutomirski <luto@...nel.org>,
        Mika Penttilä <mpenttil@...hat.com>,
        Denys Vlasenko <dvlasenk@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Josh Poimboeuf <jpoimboe@...hat.com>
Subject: Re: [PATCH v2 0/6] x86: Clean up fast syscall return validation

On Sat, 7 Oct 2023 at 02:57, Uros Bizjak <ubizjak@...il.com> wrote:
>
> Please note that 8-bit movb instruction in fact represents insert into
> word-mode register. The compiler does not know how this word-mode
> register will be used, so to avoid partial register stalls, it takes a
> cautious approach and (with -O2) moves constant to a register with a
> word-width instruction.

Yes. In this case I really think it's our kernel code that is disgusting.

People seem to think that because cpu_feature_enabled() uses static
jumps, it's somehow "free". Not so. The static jumps are often
horrendous for code quality, and they effectively make the compiler
blind to otherwise obvious optimizations.

We need to stop using those so blindly. I see code like this

       return pgtable_l5_enabled() ? PT64_ROOT_5LEVEL : PT64_ROOT_4LEVEL;

and it just makes me sad. And yes, that virtual mask is disgusting. This:

    #define __VIRTUAL_MASK_SHIFT       (pgtable_l5_enabled() ? 56 : 47)

is *NOT* cheap. The end result is *not* a constant, it's a two-way
branch, and while the branch is static and the compiler may then
optimize each side of the branch with the constant in question, most
of the time it just results in *disgusting* code like this where gcc
just has to load a constant into a register, and treat it as a
variable.

With the code shuffling, it's possibly *more* expensive than just
loading from memory, in that you just change a D$ load into an I$ one
instead. I'm sure that looks great on benchmarks that stay entirely in
the I$, but ...

Something like using alternatives is *much* cheaper. Better yet is the
suggestion to simplify the conditionals entirely to not depend on the
virtual shift at all (ie our recent access_ok() simplifications that
were due to _another_ CPU feature)

> Also, the compiler is quite eager to CSE constants. When there are two
> or more uses of the same constant, it will move a constant into the
> register first.

Now, honestly, that may be a good thing for other architectures that
have issues with constants, but on x86 - and integer constants - it's
likely a bad thing unless the constant is particularly complicated.

Most x86 instructions will take the usual 8/32-bit constants (but if
we have arbitrary 64-bit ones, you tend to need them in registers).

             Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ