lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 18 Jan 2021 00:25:16 -0800
From:   Krzysztof Olędzki <ole@....pl>
To:     Andy Lutomirski <luto@...nel.org>, x86@...nel.org
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Krzysztof Mazur <krzysiek@...lesie.net>,
        Arnd Bergmann <arnd@...db.de>
Subject: Re: [PATCH 0/4] x86/fpu: Reduce unnecessary FNINIT and MXCSR usage

On 2021-01-17 at 22:20, Andy Lutomirski wrote:
> This series fixes two regressions: a boot failure on AMD K7 and a
> performance regression on everything.
> 
> I did a double-take here -- the regressions were reported by different
> people, both named Krzysztof :)
> 
> Andy Lutomirski (4):
>    x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state
>    x86/mmx: Use KFPU_MMX for MMX string operations
>    x86/fpu: Make the EFI FPU calling convention explicit
>    x86/fpu/64: Don't FNINIT in kernel_fpu_begin()

Thank you so much Andy!

What a coincidence! Sadly, my AMD K7 is sitting somewhere in a closet, 
on a different continent, and was running Linux for the last time over 
10 years ago. :/ However, I can offer some testing on different AMD & 
Intel CPUs.

Now... It is 12 AM here so I tested it very quickly only on 5.4-stable, 
where I initially noticed the problem. The patch applies almost cleanly 
in this release, almost as arch/x86/platform/efi/efi_64.c does not have 
kernel_fpu_begin() call to update. The kernel complies and boots.

Here is the result for:
  Intel(R) Xeon(R) CPU E3-1280 V2 @ 3.60GHz (family: 0x6, model: 0x3a, 
stepping: 0x9)

5.4-stable (with "Reset MXCSR to default in kernel_fpu_begin"):
     avx       : 21072.000 MB/sec
     prefetch64-sse: 20392.000 MB/sec
     generic_sse: 18572.000 MB/sec
xor: using function: avx (21072.000 MB/sec)

5.4-stable-c4db485dd3f2378b4923503aed995f7816e265b7-revert:
     avx       : 33764.000 MB/sec
     prefetch64-sse: 23432.000 MB/sec
     generic_sse: 21036.000 MB/sec
xor: using function: avx (33764.000 MB/sec)

5.4-stable-kernel_fpu_begin_mask:
    avx       : 23576.000 MB/sec
    prefetch64-sse: 23024.000 MB/sec
    generic_sse: 20880.000 MB/sec
xor: using function: avx (23576.000 MB/sec)

So, the performance regression for prefetch64-sse and generic_sse is 
almost gone, but the AVX code is still impacted. Not as much as before, 
but still noticeably, and it is now barely better than fixed prefetch64-sse.

I'm going to test the patches on 5.10 / 5.11-rc to make sure what I have 
seen on 5.4 is not due to wrong backporting, and on different CPUs. 
However, this may have to wait until Tuesday / Wednesday due to family 
duties, as Monday is a holiday here.

Best regards,
  Krzysztof Olędzki

Powered by blists - more mailing lists