lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3e54439d-24cf-8b9d-6b5a-efb756f3a5be@ans.pl>
Date:   Thu, 21 Jan 2021 01:29:23 -0800
From:   Krzysztof Olędzki <ole@....pl>
To:     Andy Lutomirski <luto@...nel.org>, x86@...nel.org
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Krzysztof Mazur <krzysiek@...lesie.net>,
        Arnd Bergmann <arnd@...db.de>
Subject: Re: [PATCH v3 0/4] x86/fpu: Reduce unnecessary FNINIT and MXCSR usage

On 2021-01-20 at 21:09, Andy Lutomirski wrote:
> This series fixes two regressions: a boot failure on AMD K7 and a
> performance regression on everything.
> 
> I did a double-take here -- the regressions were reported by different
> people, both named Krzysztof :)
> 
> Changes from v2:
>   - Tidy up the if statements (Sean)
>   - Changelog and comment improvements (Boris)
> 
> Changes from v1:
>   - Fix MMX better -- MMX really does need FNINIT.
>   - Improve the EFI code.
>   - Rename the KFPU constants.
>   - Changelog improvements.
> 
> Andy Lutomirski (4):
>    x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state
>    x86/mmx: Use KFPU_387 for MMX string operations
>    x86/fpu: Make the EFI FPU calling convention explicit
>    x86/fpu/64: Don't FNINIT in kernel_fpu_begin()

Hi Andy,

I have tested the new patchset on the following CPUs running 5.4.90
(with some adjustments required for it to apply) and 5.10.9 kernels:
  - AMD Phenom(tm) II X3 B77 Processor (family: 0x10, model: 0x4, stepping: 0x3)
  - Intel(R) Xeon(R) CPU 3070  @ 2.66GHz (family: 0x6, model: 0xf, stepping: 0x6)
  - Intel(R) Xeon(R) CPU E3-1280 V2 @ 3.60GHz (family: 0x6, model: 0x3a, stepping: 0x9)

For all of them, it was possible to recover most of the performance lost
due to the introduction of "Reset MXCSR to default in kernel_fpu_begin":
  - B77: 90% instead of 82% for prefetch64-sse, 92% instead of 84% for generic_sse
  - 3070: 93% instead of 86% for prefetch64-sse, 93% instead of 88% for generic_sse
  - 1280v2: 99% instead of 88% for prefetch64-sse, 99% instead of 88% for generic_sse.

For some reason, 1280v2 (Ivy Bridge) sees almost no regression for
prefetch64-sse and generic_sse. The only issue is that AVX is still at
67% of its original performance. This is of course better compared to
60%. There is no AVX on the other 2 CPUs.

I was using 64 bit kernels for testing, please let me know if 32 bit
is also needed.

Tested-by: Krzysztof Piotr Olędzki <ole@....pl>

Thanks,
  Krzysztof

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ