lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250609221914.38ad6738@pumpkin>
Date: Mon, 9 Jun 2025 22:19:14 +0100
From: David Laight <david.laight.linux@...il.com>
To: Mateusz Guzik <mjguzik@...il.com>
Cc: torvalds@...ux-foundation.org, mingo@...hat.com, x86@...nel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] x86: prevent gcc from emitting rep movsq/stosq for
 inlined ops

On Thu,  5 Jun 2025 18:47:33 +0200
Mateusz Guzik <mjguzik@...il.com> wrote:

> gcc is over eager to use rep movsq/stosq (starts above 40 bytes), which
> comes with a significant penalty on CPUs without the respective fast
> short ops bits (FSRM/FSRS).
> 
> Another point is that even uarchs with FSRM don't necessarily have FSRS (Ice
> Lake and Sapphire Rapids don't).
> 
> More importantly, rep movsq is not fast even if FSRM is present.

Which architecture is that?
I got exactly the same timings for 'rep movsb' and 'rep movsq' when
I did some tests on Intel cpu going back to Ivy bridge.

I do need to redo them though, I've worked out how to time them
without using mfence/lfence and that should give a reasonable
estimation of the setup cost.
(I can measure the data-dependency of a single divide...)

	David

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ