linux-kernel - Re: [PATCH v2] x86: prevent gcc from emitting rep movsq/stosq for inlined ops

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250609221914.38ad6738@pumpkin>
Date: Mon, 9 Jun 2025 22:19:14 +0100
From: David Laight <david.laight.linux@...il.com>
To: Mateusz Guzik <mjguzik@...il.com>
Cc: torvalds@...ux-foundation.org, mingo@...hat.com, x86@...nel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] x86: prevent gcc from emitting rep movsq/stosq for
 inlined ops

On Thu,  5 Jun 2025 18:47:33 +0200
Mateusz Guzik <mjguzik@...il.com> wrote:

> gcc is over eager to use rep movsq/stosq (starts above 40 bytes), which
> comes with a significant penalty on CPUs without the respective fast
> short ops bits (FSRM/FSRS).
> 
> Another point is that even uarchs with FSRM don't necessarily have FSRS (Ice
> Lake and Sapphire Rapids don't).
> 
> More importantly, rep movsq is not fast even if FSRM is present.

Which architecture is that?
I got exactly the same timings for 'rep movsb' and 'rep movsq' when
I did some tests on Intel cpu going back to Ivy bridge.

I do need to redo them though, I've worked out how to time them
without using mfence/lfence and that should give a reasonable
estimation of the setup cost.
(I can measure the data-dependency of a single divide...)

	David