linux-kernel - Re: [PATCH v2 0/4] x86/fpu: Reduce unnecessary FNINIT and MXCSR usage

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e4b495a8-a23a-3c8c-e9c0-3f23b21d41a4@ans.pl>
Date:   Tue, 19 Jan 2021 23:51:32 -0800
From:   Krzysztof Olędzki <ole@....pl>
To:     Andy Lutomirski <luto@...nel.org>, x86@...nel.org
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Krzysztof Mazur <krzysiek@...lesie.net>,
        Arnd Bergmann <arnd@...db.de>
Subject: Re: [PATCH v2 0/4] x86/fpu: Reduce unnecessary FNINIT and MXCSR usage

On 2021-01-19 at 09:38, Andy Lutomirski wrote:
> This series fixes two regressions: a boot failure on AMD K7 and a
> performance regression on everything.
> 
> I did a double-take here -- the regressions were reported by different
> people, both named Krzysztof :)
> 
> Changes from v1:
>   - Fix MMX better -- MMX really does need FNINIT.
>   - Improve the EFI code.
>   - Rename the KFPU constants.
>   - Changelog improvements.
> 
> Andy Lutomirski (4):
>    x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state
>    x86/mmx: Use KFPU_387 for MMX string operations
>    x86/fpu: Make the EFI FPU calling convention explicit
>    x86/fpu/64: Don't FNINIT in kernel_fpu_begin()

Hi Andy.

I have quickly tested the new version on E3-1280 V2.

* 5.10.9 + 7ad816762f9bf89e940e618ea40c43138b479e10 reverted (aka unfixed)
xor: measuring software checksum speed
    avx             : 38616 MB/sec
    prefetch64-sse  : 25797 MB/sec
    generic_sse     : 23147 MB/sec
xor: using function: avx (38616 MB/sec)

* 5.10.9 (the original)
xor: measuring software checksum speed
    avx             : 23318 MB/sec
    prefetch64-sse  : 22562 MB/sec
    generic_sse     : 20431 MB/sec
xor: using function: avx (23318 MB/sec)

* 5.10.9 + "Reduce unnecessary FNINIT and MXCSR usage" v2
xor: measuring software checksum speed
    avx             : 26451 MB/sec
    prefetch64-sse  : 25777 MB/sec
    generic_sse     : 23178 MB/sec
xor: using function: avx (26451 MB/sec)

Overall, kernel xor benchmark reports better performance on 5.10.9 than 
on 5.4.90 (see my prev e-mail), but the general trend is the same.

The "unfixed" kernel is much faster for all of avx, prefetch64-sse and 
generic_sse. With the fix, we see the expected perf regression.

Now, with your patchset, both prefetch64-sse and generic_sse are able to 
recover the full performance, as seen on 5.4. However, this is not the 
case for avx. While there is still an improvement, it is nowhere close 
to where it used to be.

I wonder why AVX still sees a regression and if anything more can be 
done about it?

Will do more tests tomorrow.

Thanks,
  Krzysztof