[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3e54439d-24cf-8b9d-6b5a-efb756f3a5be@ans.pl>
Date: Thu, 21 Jan 2021 01:29:23 -0800
From: Krzysztof Olędzki <ole@....pl>
To: Andy Lutomirski <luto@...nel.org>, x86@...nel.org
Cc: LKML <linux-kernel@...r.kernel.org>,
Krzysztof Mazur <krzysiek@...lesie.net>,
Arnd Bergmann <arnd@...db.de>
Subject: Re: [PATCH v3 0/4] x86/fpu: Reduce unnecessary FNINIT and MXCSR usage
On 2021-01-20 at 21:09, Andy Lutomirski wrote:
> This series fixes two regressions: a boot failure on AMD K7 and a
> performance regression on everything.
>
> I did a double-take here -- the regressions were reported by different
> people, both named Krzysztof :)
>
> Changes from v2:
> - Tidy up the if statements (Sean)
> - Changelog and comment improvements (Boris)
>
> Changes from v1:
> - Fix MMX better -- MMX really does need FNINIT.
> - Improve the EFI code.
> - Rename the KFPU constants.
> - Changelog improvements.
>
> Andy Lutomirski (4):
> x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state
> x86/mmx: Use KFPU_387 for MMX string operations
> x86/fpu: Make the EFI FPU calling convention explicit
> x86/fpu/64: Don't FNINIT in kernel_fpu_begin()
Hi Andy,
I have tested the new patchset on the following CPUs running 5.4.90
(with some adjustments required for it to apply) and 5.10.9 kernels:
- AMD Phenom(tm) II X3 B77 Processor (family: 0x10, model: 0x4, stepping: 0x3)
- Intel(R) Xeon(R) CPU 3070 @ 2.66GHz (family: 0x6, model: 0xf, stepping: 0x6)
- Intel(R) Xeon(R) CPU E3-1280 V2 @ 3.60GHz (family: 0x6, model: 0x3a, stepping: 0x9)
For all of them, it was possible to recover most of the performance lost
due to the introduction of "Reset MXCSR to default in kernel_fpu_begin":
- B77: 90% instead of 82% for prefetch64-sse, 92% instead of 84% for generic_sse
- 3070: 93% instead of 86% for prefetch64-sse, 93% instead of 88% for generic_sse
- 1280v2: 99% instead of 88% for prefetch64-sse, 99% instead of 88% for generic_sse.
For some reason, 1280v2 (Ivy Bridge) sees almost no regression for
prefetch64-sse and generic_sse. The only issue is that AVX is still at
67% of its original performance. This is of course better compared to
60%. There is no AVX on the other 2 CPUs.
I was using 64 bit kernels for testing, please let me know if 32 bit
is also needed.
Tested-by: Krzysztof Piotr Olędzki <ole@....pl>
Thanks,
Krzysztof
Powered by blists - more mailing lists