linux-kernel - Re: [PATCH] x86: handle the tail in rep_movs_alternative() with an overlapping store

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAGudoHGNFT+LC24842ZKLWBxD3vvvddBqDKa6gkixN4Esor+RQ@mail.gmail.com>
Date: Thu, 20 Mar 2025 20:33:35 +0100
From: Mateusz Guzik <mjguzik@...il.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: x86@...nel.org, hkrzesin@...hat.com, tglx@...utronix.de, mingo@...hat.com, 
	bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com, olichtne@...hat.com, 
	atomasov@...hat.com, aokuliar@...hat.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] x86: handle the tail in rep_movs_alternative() with an
 overlapping store

On Thu, Mar 20, 2025 at 8:23 PM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> On Thu, 20 Mar 2025 at 12:06, Mateusz Guzik <mjguzik@...il.com> wrote:
> >
> > Sizes ranged <8,64> are copied 8 bytes at a time with a jump out to a
> > 1 byte at a time loop to handle the tail.
>
> I definitely do not mind this patch, but I think it doesn't go far enough.
>
> It gets rid of the byte-at-a-time loop at the end, but only for the
> short-copy case of 8-63 bytes.
>

This bit I can vouch for.

> The .Llarge_movsq ends up still doing
>
>         testl %ecx,%ecx
>         jne .Lcopy_user_tail
>         RET
>
> and while that is only triggered by the non-ERMS case, that's what
> most older AMD CPU's will trigger, afaik.
>

This bit I can't.

Per my other e-mail it has been several years since I was seriously
digging in the area (around 7 by now I think) and details are rather
fuzzy.

I have a recollection that handling the tail after rep movsq with an
overlapping store was suffering a penalty big enough to warrant a
"normal" copy instead, avoiding the just written to area. I see my old
routine $elsewhere makes sure to do it. I don't have sensible hw to
bench this on either at the moment.

That said, if you insist on it, I'll repost v2 with the change (I'm
going to *test* it of course, just not bench. :>)
-- 
Mateusz Guzik <mjguzik gmail.com>