lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 14 Jan 2021 13:36:57 +0100
From:   Krzysztof Mazur <krzysiek@...lesie.net>
To:     Borislav Petkov <bp@...en8.de>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, x86@...nel.org,
        linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH] x86/lib: don't use MMX before FPU initialization

On Thu, Jan 14, 2021 at 10:44:25AM +0100, Borislav Petkov wrote:
> I believe the correct fix should be
> 
> 	if (unlikely(in_interrupt()) || !(cr4_read_shadow() & X86_CR4_OSFXSR))
> 		return __memcpy(to, from, len);
> 
> in _mmx_memcpy() as you had it in your first patch.
> 

The OSFXSR must be set only on CPUs with SSE. There
are some CPUs with 3DNow!, but without SSE and FXSR (like AMD
Geode LX, which is still used in many embedded systems).
So, I've changed that to:

if (unlikely(in_interrupt()) || (boot_cpu_has(X86_FEATURE_XMM) &&
		unlikely(!(cr4_read_shadow() & X86_CR4_OSFXSR))))


However, I'm not sure that adding two taken branches (that
second unlikely() does not help gcc to generate better code) and
a call to cr4_read_shadow() (not inlined) is a good idea.
The static branch adds a single jump, which is changed to
NOP. The code with boot_cpu_has() and CR4 checks adds:

c09932ad:	a1 50 50 c3 c0       	mov    0xc0c35050,%eax
c09932b2:	a9 00 00 00 02       	test   $0x2000000,%eax
/* this branch is taken */
c09932b7:	0f 85 13 01 00 00    	jne    c09933d0 <_mmx_memcpy+0x140>

c09932bd:	/* MMX code is here */

/* cr4_read_shadow is not inlined */
c09933d0:	e8 8b 01 78 ff       	call   c0113560 <cr4_read_shadow>
c09933d5:	f6 c4 02             	test   $0x2,%ah
/* this branch is also taken */
c09933d8:	0f 85 df fe ff ff    	jne    c09932bd <_mmx_memcpy+0x2d>


However, except for some embedded systems, probably almost nobody uses
that code today, and the most imporant is avoiding future breakage. And
I'm not sure which approach is better. Because that CR4 test tests
for a feature that is not used in mmx_memcpy(), but it's used in
kernel_fpu_begin(). And in future kernel_fpu_begin() may change and
require also other stuff. So I think that the best approach would be
delay any FPU optimized memcpy() after fpu__init_system() is executed.

Best regards,
Krzysiek
-- >8 --
Subject: [PATCH] x86/lib: don't use mmx_memcpy() to early

The MMX 3DNow! optimized memcpy() is used very early,
even before FPU is initialized in the kernel. It worked fine, but commit
7ad816762f9bf89e940e618ea40c43138b479e10 ("x86/fpu: Reset MXCSR
to default in kernel_fpu_begin()") broke that. After that
commit the kernel_fpu_begin() assumes that FXSR is enabled in
the CR4 register on all processors with SSE. Because memcpy() is used
before FXSR is enabled, the kernel crashes just after "Booting the kernel."
message. It affects all kernels with CONFIG_X86_USE_3DNOW (enabled when
some AMD/Cyrix processors are selected) on processors with SSE
(like AMD K7, which supports both MMX 3DNow! and SSE).

Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: Ingo Molnar <mingo@...hat.com>
Cc: Borislav Petkov <bp@...en8.de>
Cc: "H. Peter Anvin" <hpa@...or.com>
Signed-off-by: Krzysztof Mazur <krzysiek@...lesie.net>
---
 arch/x86/lib/mmx_32.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/lib/mmx_32.c b/arch/x86/lib/mmx_32.c
index 4321fa02e18d..70aa769570e6 100644
--- a/arch/x86/lib/mmx_32.c
+++ b/arch/x86/lib/mmx_32.c
@@ -25,13 +25,20 @@
 
 #include <asm/fpu/api.h>
 #include <asm/asm.h>
+#include <asm/tlbflush.h>
 
 void *_mmx_memcpy(void *to, const void *from, size_t len)
 {
 	void *p;
 	int i;
 
-	if (unlikely(in_interrupt()))
+	/*
+	 * kernel_fpu_begin() assumes that FXSR is enabled on all processors
+	 * with SSE. Thus, MMX-optimized version can't be used
+	 * before the kernel enables FXSR (OSFXSR bit in the CR4 register).
+	 */
+	if (unlikely(in_interrupt()) || (boot_cpu_has(X86_FEATURE_XMM) &&
+			unlikely(!(cr4_read_shadow() & X86_CR4_OSFXSR))))
 		return __memcpy(to, from, len);
 
 	p = to;
-- 
2.27.0.rc1.207.gb85828341f

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ