[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250609221914.38ad6738@pumpkin>
Date: Mon, 9 Jun 2025 22:19:14 +0100
From: David Laight <david.laight.linux@...il.com>
To: Mateusz Guzik <mjguzik@...il.com>
Cc: torvalds@...ux-foundation.org, mingo@...hat.com, x86@...nel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] x86: prevent gcc from emitting rep movsq/stosq for
inlined ops
On Thu, 5 Jun 2025 18:47:33 +0200
Mateusz Guzik <mjguzik@...il.com> wrote:
> gcc is over eager to use rep movsq/stosq (starts above 40 bytes), which
> comes with a significant penalty on CPUs without the respective fast
> short ops bits (FSRM/FSRS).
>
> Another point is that even uarchs with FSRM don't necessarily have FSRS (Ice
> Lake and Sapphire Rapids don't).
>
> More importantly, rep movsq is not fast even if FSRM is present.
Which architecture is that?
I got exactly the same timings for 'rep movsb' and 'rep movsq' when
I did some tests on Intel cpu going back to Ivy bridge.
I do need to redo them though, I've worked out how to time them
without using mfence/lfence and that should give a reasonable
estimation of the setup cost.
(I can measure the data-dependency of a single divide...)
David
Powered by blists - more mailing lists