lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue,  5 May 2015 19:50:25 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	linux-kernel@...r.kernel.org
Cc:	Andy Lutomirski <luto@...capital.net>,
	Borislav Petkov <bp@...en8.de>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	Fenghua Yu <fenghua.yu@...el.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Oleg Nesterov <oleg@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: [PATCH 133/208] x86/fpu: Optimize copy_fpregs_to_fpstate() by removing the FNCLEX synchronization with FP exceptions

So we have the following ancient code in copy_fpregs_to_fpstate():

	if (unlikely(fpu->state->fxsave.swd & X87_FSW_ES)) {
		asm volatile("fnclex");
		goto drop_fpregs;
	}

which clears pending FPU exceptions and then drops registers, which
causes the next FP instruction of the saved context to re-load the
saved FPU state, with all pending exceptions marked properly, and
will re-start the exception handling mechanism in the hardware.

Since FPU exceptions are always issued on instruction boundaries,
in particular on the next FP instruction following the exception
generating instruction, there's no fear of getting an FP exception
asynchronously.

They were truly asynchronous back in the IRQ13 days, when the FPU was
a weird and expensive co-processor that did its own processing, and we
had to synchronize with them, but that code is not working anymore:
we don't have IRQ13 mapped in the IDT anymore.

With the introduction of optimized XSAVE support there's a new
complication: if the xstate features bit indicates that a particular
state component is unused (in 'init state'), then the hardware does
not guarantee that the XSAVE (et al) instruction keeps the underlying
FPU state image in memory valid and current. In practice this means
that the hardware won't write it, and the exceptions flag in the
state might be an older version, with it still being set. This
meant that we had to check the xfeatures flag as well, adding
another memory load and branch to a critical hot path of the scheduler.

So optimize all this by removing both the old quirk and the new check,
and straight-line optimizing the most common cases with likely()
hints. Quite a bit of code gets removed this way:

  arch/x86/kernel/process_64.o:

    text    data     bss     dec     filename
    5484       8       0    5492     process_64.o.before
    5416       8       0    5424     process_64.o.after

Now there's also a chance that some weird behavior or erratum was
masked by our IRQ13 handling quirk (or that I misunderstood the
nature of the quirk), and that this change triggers some badness.

There's no real good way to protect against that possibility other
than keeping this change well isolated, well commented and well
bisectable. If you bisect a weird (or not so weird) breakage to
this commit then please let us know!

Reviewed-by: Borislav Petkov <bp@...en8.de>
Cc: Andy Lutomirski <luto@...capital.net>
Cc: Dave Hansen <dave.hansen@...ux.intel.com>
Cc: Fenghua Yu <fenghua.yu@...el.com>
Cc: H. Peter Anvin <hpa@...or.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Oleg Nesterov <oleg@...hat.com>
Cc: Thomas Gleixner <tglx@...utronix.de>
Signed-off-by: Ingo Molnar <mingo@...nel.org>
---
 arch/x86/include/asm/fpu/internal.h | 40 ++++++++++------------------------------
 1 file changed, 10 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 11055f51e67a..10663b02ee22 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -271,46 +271,26 @@ static inline void fpu_fxsave(struct fpu *fpu)
  * The legacy FNSAVE instruction cleared all FPU state
  * unconditionally, so registers are essentially destroyed.
  * Modern FPU state can be kept in registers, if there are
- * no pending FP exceptions. (Note the FIXME below.)
+ * no pending FP exceptions.
  */
 static inline int copy_fpregs_to_fpstate(struct fpu *fpu)
 {
-	if (use_xsave()) {
+	if (likely(use_xsave())) {
 		xsave_state(&fpu->state->xsave);
+		return 1;
+	}
 
-		/*
-		 * xsave header may indicate the init state of the FP.
-		 */
-		if (!(fpu->state->xsave.header.xfeatures & XSTATE_FP))
-			goto keep_fpregs;
-	} else {
-		if (use_fxsr()) {
-			fpu_fxsave(fpu);
-		} else {
-			/* FNSAVE always clears FPU registers: */
-			asm volatile("fnsave %[fx]; fwait"
-				     : [fx] "=m" (fpu->state->fsave));
-			goto drop_fpregs;
-		}
+	if (likely(use_fxsr())) {
+		fpu_fxsave(fpu);
+		return 1;
 	}
 
 	/*
-	 * If exceptions are pending, we need to clear them so
-	 * that we don't randomly get exceptions later.
-	 *
-	 * FIXME! Is this perhaps only true for the old-style
-	 * irq13 case? Maybe we could leave the x87 state
-	 * intact otherwise?
+	 * Legacy FPU register saving, FNSAVE always clears FPU registers,
+	 * so we have to mark them inactive:
 	 */
-	if (unlikely(fpu->state->fxsave.swd & X87_FSW_ES)) {
-		asm volatile("fnclex");
-		goto drop_fpregs;
-	}
-
-keep_fpregs:
-	return 1;
+	asm volatile("fnsave %[fx]; fwait" : [fx] "=m" (fpu->state->fsave));
 
-drop_fpregs:
 	return 0;
 }
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ