netdev - Re: x86 copy performance regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=whqNMUPbjCyMjyxfH_5-Xass=DrMkPT5ZTJbFrtU=qDEQ@mail.gmail.com>
Date: Fri, 26 May 2023 09:29:59 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Eric Dumazet <edumazet@...gle.com>
Cc: LKML <linux-kernel@...r.kernel.org>, netdev <netdev@...r.kernel.org>
Subject: Re: x86 copy performance regression

On Fri, May 26, 2023 at 8:00 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> We can see rep_movs_alternative() using more cycles in kernel profiles
> than the previous variant (copy_user_enhanced_fast_string, which was
> simply using "rep  movsb"), and we can not reach line rate (as we
> could before the series)

Hmm. I assume the attached patch ends up fixing the regression?

That hack to generate the two-byte 'jae' instruction even for the
alternative is admittedly not pretty, but I just couldn't deal with
the alternative that generated pointlessly bad code.

We could make the constant in the comparison depend on whether it is
for the unrolled or for the erms case too, I guess, but I think erms
is probably "good enough" with 64-byte copies.

I was really hoping we could avoid this, but hey, a regression is a regression.

Can you verify this patch fixes things for you?

                  Linus

View attachment "patch.diff" of type "text/x-patch" (1194 bytes)