[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200229143644.GA1129@zn.tnic>
Date: Sat, 29 Feb 2020 15:36:44 +0100
From: Borislav Petkov <bp@...en8.de>
To: Yu-cheng Yu <yu-cheng.yu@...el.com>
Cc: linux-kernel@...r.kernel.org, x86@...nel.org,
"H. Peter Anvin" <hpa@...or.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Tony Luck <tony.luck@...el.com>,
Andy Lutomirski <luto@...nel.org>,
Rik van Riel <riel@...riel.com>,
"Ravi V. Shankar" <ravi.v.shankar@...el.com>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Fenghua Yu <fenghua.yu@...el.com>,
Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH v2 8/8] x86/fpu/xstate: Restore supervisor xstates for
__fpu__restore_sig()
On Fri, Feb 28, 2020 at 02:13:29PM -0800, Yu-cheng Yu wrote:
> If the XSAVES buffer already has current data (i.e. TIF_NEED_FPU_LOAD is
> set), then skip copy_xregs_to_kernel(). This happens when the task was
> context-switched out and has not returned to user-mode.
So I got tired of this peacemeal game back'n'forth and went and did your
work for ya.
First of all, on my fairly new KBL test box, the context size is almost
a kB:
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
[ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.000000] x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64
[ 0.000000] x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64
[ 0.000000] x86/fpu: Enabled xstate features 0x1f, context size is 960 bytes, using 'compacted' format.
Then, I added this ontop of your patchset:
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 0d3e06a772b0..2e57b8d79c0e 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -337,6 +337,8 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
*/
fpregs_lock();
if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
+ trace_printk("!NEED_FPU_LOAD, size: %d, supervisor: 0x%llx\n",
+ size, xfeatures_mask_supervisor());
if (xfeatures_mask_supervisor())
copy_xregs_to_kernel(&fpu->state.xsave);
set_thread_flag(TIF_NEED_FPU_LOAD);
and traced a fairly boring kernel build workload where the kernel
.config is not even a distro one but a tailored for this machine.
Which means, it took 3m35.058s to build and the trace buffer had 53973
entries like this one:
bash-1211 [002] ...1 648.238585: __fpu__restore_sig: !NEED_FPU_LOAD, size: 1092, supervisor: 0x0
which means I have
53973 / (3*60 + 35) =~ 251 XSAVES invocations per second!
And this only during this single workload - I don't even wanna imagine
what that number would be if it were a huge, overloaded box with a
signal heavy workload.
And all this overhead to save 16 + 24 bytes supervisor states and throw
away the rest up to 960 bytes each time.
Err, I don't think so.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists