lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Mon, 2 May 2016 20:32:16 +0200 From: Ingo Molnar <mingo@...nel.org> To: Dave Hansen <dave.hansen@...ux.intel.com> Cc: Andy Lutomirski <luto@...capital.net>, Yu-cheng Yu <yu-cheng.yu@...el.com>, X86 ML <x86@...nel.org>, "H. Peter Anvin" <hpa@...or.com>, Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Andy Lutomirski <luto@...nel.org>, Borislav Petkov <bp@...e.de>, Sai Praneeth Prakhya <sai.praneeth.prakhya@...el.com>, "Ravi V. Shankar" <ravi.v.shankar@...el.com>, Fenghua Yu <fenghua.yu@...el.com> Subject: Re: [PATCH v4 0/10] x86/xsaves: Fix XSAVES known issues * Dave Hansen <dave.hansen@...ux.intel.com> wrote: > On 04/30/2016 12:53 AM, Ingo Molnar wrote: > > We can still use the compacted area handling instructions, because presumably > > those are the fastest and are also the most optimized ones? But I wouldn't use > > them to do dynamic allocation: just allocate the maximum possible FPU save area at > > task creation time and never again worry about that detail. > > > > Ok? > > Sounds sane to me. > > BTW, I hacked up your "fpu performance" to compare XSAVE vs. XSAVES: > > > [ 0.048347] x86/fpu: Cost of: XSAVE insn : 127 cycles > > [ 0.049134] x86/fpu: Cost of: XSAVES insn : 113 cycles > > [ 0.048492] x86/fpu: Cost of: XRSTOR insn : 120 cycles > > [ 0.049267] x86/fpu: Cost of: XRSTORS insn : 102 cycles > > So I guess we can add that to the list of things that XSAVES is good for. Absolutely! > [...] Granted, the real-world benefit is probably hard to measure because the > cache residency of the XSAVE buffer isn't as good when _actually_ context > switching, but this at least shows a small theoretical advantage for XSAVES. Yeah, and anything that was measured for real is far from being theoretical. It's simply a best-case microbenchmark figure, but it's still a nice 10+ cycles improvement overall - which might become bigger in future CPU generations. Thanks, Ingo
Powered by blists - more mailing lists