linux-kernel - Re: [PATCH v2] x86: prevent gcc from emitting rep movsq/stosq for inlined ops

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wg-+FZTCTBY79UBMc=MT1-t5EWtGOXt=kYySUmqZU4qxQ@mail.gmail.com>
Date: Thu, 5 Jun 2025 11:32:03 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Mateusz Guzik <mjguzik@...il.com>
Cc: mingo@...hat.com, x86@...nel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] x86: prevent gcc from emitting rep movsq/stosq for
 inlined ops

On Thu, 5 Jun 2025 at 09:47, Mateusz Guzik <mjguzik@...il.com> wrote:
>
> gcc is over eager to use rep movsq/stosq (starts above 40 bytes), which
> comes with a significant penalty on CPUs without the respective fast
> short ops bits (FSRM/FSRS).

I have said this before, and I'll say it again: I do not want random
crazy internal compiler tuning flags in the kernel sources.

We've had them before with things like inline limits, and it's
absolutely horrendous.

If you believe in this so much, add it to your gcc spec file. Or
continue to push gcc code improvement.

But this is not in any way kernel-specific, and I do not want to have
random "compiler internal modification flags" for code generation.

We want to have much higher-level things like "-O2" and "-march=xyz"
for optimization.

Now, for *correctness* issues like instruction choices, we will do odd
low-level internal flags like "don't use AVX", or
"-fno-strict-overflow" that are fixing ABI issues or bugs in the
language definition. So it's not like we don't ever do low-level
internal implementation compiler flags, but not for random
microarchitecture tuning.

           Linus