[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89i+bExb_P6A9ROmwqNgGdO5o8wawVZ5r3MHnz0qfhxvTtA@mail.gmail.com>
Date: Fri, 26 May 2023 18:37:09 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: LKML <linux-kernel@...r.kernel.org>, netdev <netdev@...r.kernel.org>
Subject: Re: x86 copy performance regression
On Fri, May 26, 2023 at 6:30 PM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> On Fri, May 26, 2023 at 8:00 AM Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > We can see rep_movs_alternative() using more cycles in kernel profiles
> > than the previous variant (copy_user_enhanced_fast_string, which was
> > simply using "rep movsb"), and we can not reach line rate (as we
> > could before the series)
>
> Hmm. I assume the attached patch ends up fixing the regression?
>
> That hack to generate the two-byte 'jae' instruction even for the
> alternative is admittedly not pretty, but I just couldn't deal with
> the alternative that generated pointlessly bad code.
>
> We could make the constant in the comparison depend on whether it is
> for the unrolled or for the erms case too, I guess, but I think erms
> is probably "good enough" with 64-byte copies.
>
> I was really hoping we could avoid this, but hey, a regression is a regression.
>
> Can you verify this patch fixes things for you?
Hmm.. my build environment does not like this yet :)
arch/x86/lib/copy_user_64.S:40:30: error: unexpected token in argument list
0: alternative ".byte 0x73," ".Lunrolled" "-0b-2", ".byte 0x73,"
".Llarge" "-0b-2", X86_FEATURE_ERMS
^
make[3]: *** [scripts/Makefile.build:374: arch/x86/lib/copy_user_64.o] Error 1
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [scripts/Makefile.build:494: arch/x86/lib] Error 2
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [Makefile:2026: .] Error 2
Powered by blists - more mailing lists