[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=whqNMUPbjCyMjyxfH_5-Xass=DrMkPT5ZTJbFrtU=qDEQ@mail.gmail.com>
Date: Fri, 26 May 2023 09:29:59 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Eric Dumazet <edumazet@...gle.com>
Cc: LKML <linux-kernel@...r.kernel.org>,
netdev <netdev@...r.kernel.org>
Subject: Re: x86 copy performance regression
On Fri, May 26, 2023 at 8:00 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> We can see rep_movs_alternative() using more cycles in kernel profiles
> than the previous variant (copy_user_enhanced_fast_string, which was
> simply using "rep movsb"), and we can not reach line rate (as we
> could before the series)
Hmm. I assume the attached patch ends up fixing the regression?
That hack to generate the two-byte 'jae' instruction even for the
alternative is admittedly not pretty, but I just couldn't deal with
the alternative that generated pointlessly bad code.
We could make the constant in the comparison depend on whether it is
for the unrolled or for the erms case too, I guess, but I think erms
is probably "good enough" with 64-byte copies.
I was really hoping we could avoid this, but hey, a regression is a regression.
Can you verify this patch fixes things for you?
Linus
View attachment "patch.diff" of type "text/x-patch" (1194 bytes)
Powered by blists - more mailing lists