[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251127095801.0473d641@pumpkin>
Date: Thu, 27 Nov 2025 09:58:01 +0000
From: david laight <david.laight@...box.com>
To: Mateusz Guzik <mjguzik@...il.com>
Cc: x86@...nel.org, glx@...utronix.de, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com, hpa@...or.com, linux-kernel@...r.kernel.org,
torvalds@...ux-foundation.org, olichtne@...hat.com, atomasov@...hat.com,
aokuliar@...hat.com
Subject: Re: performance anomaly in rep movsq/movsb as seen on Sapphire
Rapids executing sync_regs()
On Thu, 27 Nov 2025 07:55:27 +0100
Mateusz Guzik <mjguzik@...il.com> wrote:
> Sapphire Rapids has both ERMS (of course) and FSRM.
>
> sync_regs() runs into a corner case where both rep movsq and rep movsb
> suffer massive penalty for being used to copy 168 bytes, which clear
> itself when data is copied by a bunch of movq instead.
>
> I verified the issue is not present on AMD EPYC 9454, I don't know about
> other Intel CPUs.
On pretty much all intel cpu 'rep movsb' and 'rep movsq' seem to be
implemented in the same hardware - so the length in the 'q' case is
just multiplied by 8.
(That goes all the way back to Sandy bridge.)
I'm guessing all the copies are at the same page alignment?
I found some strange alignment related issues on a zen-5 cpu.
Mostly neither the source nor destination alignment made much difference.
(Apart from (IIRC) 64 byte aligning the destination doubling throughput.)
But some copies were horribly slow.
It was something like copies where the page offset of the destination
was less than 64 bytes from the page offset of the src and the src wasn't
on a page boundary (the byte alignment wasn't relevant).
I wonder if Sapphire Rapids has some similar perversion?
Or, is that one of the big/little cpu where most of the cpu are
actually atom ones - which may not have either ERMS or FSRM ?
I need to rerun those tests using data dependencies instead of lfence
and get a much better estimation of the instruction setup time.
But I am lacking old amd and new intel hardware.
David
Powered by blists - more mailing lists