[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=whd82fzhEbFRw9d_EMtR1SeefOJabjCHcm4-6jzeqqd3g@mail.gmail.com>
Date: Thu, 20 Mar 2025 12:23:38 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Mateusz Guzik <mjguzik@...il.com>
Cc: x86@...nel.org, hkrzesin@...hat.com, tglx@...utronix.de, mingo@...hat.com,
bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com, olichtne@...hat.com,
atomasov@...hat.com, aokuliar@...hat.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] x86: handle the tail in rep_movs_alternative() with an
overlapping store
On Thu, 20 Mar 2025 at 12:06, Mateusz Guzik <mjguzik@...il.com> wrote:
>
> Sizes ranged <8,64> are copied 8 bytes at a time with a jump out to a
> 1 byte at a time loop to handle the tail.
I definitely do not mind this patch, but I think it doesn't go far enough.
It gets rid of the byte-at-a-time loop at the end, but only for the
short-copy case of 8-63 bytes.
The .Llarge_movsq ends up still doing
testl %ecx,%ecx
jne .Lcopy_user_tail
RET
and while that is only triggered by the non-ERMS case, that's what
most older AMD CPU's will trigger, afaik.
So I think that if we do this, we should do it properly.
Linus
Powered by blists - more mailing lists