lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Mon, 18 Jan 2021 00:25:16 -0800 From: Krzysztof Olędzki <ole@....pl> To: Andy Lutomirski <luto@...nel.org>, x86@...nel.org Cc: LKML <linux-kernel@...r.kernel.org>, Krzysztof Mazur <krzysiek@...lesie.net>, Arnd Bergmann <arnd@...db.de> Subject: Re: [PATCH 0/4] x86/fpu: Reduce unnecessary FNINIT and MXCSR usage On 2021-01-17 at 22:20, Andy Lutomirski wrote: > This series fixes two regressions: a boot failure on AMD K7 and a > performance regression on everything. > > I did a double-take here -- the regressions were reported by different > people, both named Krzysztof :) > > Andy Lutomirski (4): > x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state > x86/mmx: Use KFPU_MMX for MMX string operations > x86/fpu: Make the EFI FPU calling convention explicit > x86/fpu/64: Don't FNINIT in kernel_fpu_begin() Thank you so much Andy! What a coincidence! Sadly, my AMD K7 is sitting somewhere in a closet, on a different continent, and was running Linux for the last time over 10 years ago. :/ However, I can offer some testing on different AMD & Intel CPUs. Now... It is 12 AM here so I tested it very quickly only on 5.4-stable, where I initially noticed the problem. The patch applies almost cleanly in this release, almost as arch/x86/platform/efi/efi_64.c does not have kernel_fpu_begin() call to update. The kernel complies and boots. Here is the result for: Intel(R) Xeon(R) CPU E3-1280 V2 @ 3.60GHz (family: 0x6, model: 0x3a, stepping: 0x9) 5.4-stable (with "Reset MXCSR to default in kernel_fpu_begin"): avx : 21072.000 MB/sec prefetch64-sse: 20392.000 MB/sec generic_sse: 18572.000 MB/sec xor: using function: avx (21072.000 MB/sec) 5.4-stable-c4db485dd3f2378b4923503aed995f7816e265b7-revert: avx : 33764.000 MB/sec prefetch64-sse: 23432.000 MB/sec generic_sse: 21036.000 MB/sec xor: using function: avx (33764.000 MB/sec) 5.4-stable-kernel_fpu_begin_mask: avx : 23576.000 MB/sec prefetch64-sse: 23024.000 MB/sec generic_sse: 20880.000 MB/sec xor: using function: avx (23576.000 MB/sec) So, the performance regression for prefetch64-sse and generic_sse is almost gone, but the AVX code is still impacted. Not as much as before, but still noticeably, and it is now barely better than fixed prefetch64-sse. I'm going to test the patches on 5.10 / 5.11-rc to make sure what I have seen on 5.4 is not due to wrong backporting, and on different CPUs. However, this may have to wait until Tuesday / Wednesday due to family duties, as Monday is a holiday here. Best regards, Krzysztof Olędzki
Powered by blists - more mailing lists