linux-kernel - Re: performance anomaly in rep movsq/movsb as seen on Sapphire Rapids executing sync

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <4c11854e-74d8-4e6e-92a9-c025ef330fcd@intel.com>
Date: Wed, 3 Dec 2025 13:37:06 -0800
From: Dave Hansen <dave.hansen@...el.com>
To: Mateusz Guzik <mjguzik@...il.com>, x86@...nel.org
Cc: glx@...utronix.de, mingo@...hat.com, bp@...en8.de,
 dave.hansen@...ux.intel.com, hpa@...or.com, linux-kernel@...r.kernel.org,
 torvalds@...ux-foundation.org, olichtne@...hat.com, atomasov@...hat.com,
 aokuliar@...hat.com
Subject: Re: performance anomaly in rep movsq/movsb as seen on Sapphire Rapids
 executing sync_regs()

On 11/26/25 22:55, Mateusz Guzik wrote:
> I figured movsq still sucks on the uarch, so I patched the kernel to use
> movsb instead, but performance barely budged.
> 
> However, forcing the thing to do the copy with regular stores in
> memcpy_orig (32 bytes per loop iteration + 8 bytes tail) unclogs it.

Any chance this can be reproduced in userspace somehow? Does any old
copy of 168 bytes do better with regular stores than rep movsq?