[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5162ba3a-0b4c-a295-44dd-7ea2f17ca74d@ans.pl>
Date: Mon, 18 Jan 2021 00:25:16 -0800
From: Krzysztof Olędzki <ole@....pl>
To: Andy Lutomirski <luto@...nel.org>, x86@...nel.org
Cc: LKML <linux-kernel@...r.kernel.org>,
Krzysztof Mazur <krzysiek@...lesie.net>,
Arnd Bergmann <arnd@...db.de>
Subject: Re: [PATCH 0/4] x86/fpu: Reduce unnecessary FNINIT and MXCSR usage
On 2021-01-17 at 22:20, Andy Lutomirski wrote:
> This series fixes two regressions: a boot failure on AMD K7 and a
> performance regression on everything.
>
> I did a double-take here -- the regressions were reported by different
> people, both named Krzysztof :)
>
> Andy Lutomirski (4):
> x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state
> x86/mmx: Use KFPU_MMX for MMX string operations
> x86/fpu: Make the EFI FPU calling convention explicit
> x86/fpu/64: Don't FNINIT in kernel_fpu_begin()
Thank you so much Andy!
What a coincidence! Sadly, my AMD K7 is sitting somewhere in a closet,
on a different continent, and was running Linux for the last time over
10 years ago. :/ However, I can offer some testing on different AMD &
Intel CPUs.
Now... It is 12 AM here so I tested it very quickly only on 5.4-stable,
where I initially noticed the problem. The patch applies almost cleanly
in this release, almost as arch/x86/platform/efi/efi_64.c does not have
kernel_fpu_begin() call to update. The kernel complies and boots.
Here is the result for:
Intel(R) Xeon(R) CPU E3-1280 V2 @ 3.60GHz (family: 0x6, model: 0x3a,
stepping: 0x9)
5.4-stable (with "Reset MXCSR to default in kernel_fpu_begin"):
avx : 21072.000 MB/sec
prefetch64-sse: 20392.000 MB/sec
generic_sse: 18572.000 MB/sec
xor: using function: avx (21072.000 MB/sec)
5.4-stable-c4db485dd3f2378b4923503aed995f7816e265b7-revert:
avx : 33764.000 MB/sec
prefetch64-sse: 23432.000 MB/sec
generic_sse: 21036.000 MB/sec
xor: using function: avx (33764.000 MB/sec)
5.4-stable-kernel_fpu_begin_mask:
avx : 23576.000 MB/sec
prefetch64-sse: 23024.000 MB/sec
generic_sse: 20880.000 MB/sec
xor: using function: avx (23576.000 MB/sec)
So, the performance regression for prefetch64-sse and generic_sse is
almost gone, but the AVX code is still impacted. Not as much as before,
but still noticeably, and it is now barely better than fixed prefetch64-sse.
I'm going to test the patches on 5.10 / 5.11-rc to make sure what I have
seen on 5.4 is not due to wrong backporting, and on different CPUs.
However, this may have to wait until Tuesday / Wednesday due to family
duties, as Monday is a holiday here.
Best regards,
Krzysztof Olędzki
Powered by blists - more mailing lists