lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240624170921.mep2x6pg4aiui4wh@desk>
Date: Mon, 24 Jun 2024 10:09:21 -0700
From: Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>
To: Jari Ruusu <jariruusu@...tonmail.com>
Cc: Dave Hansen <dave.hansen@...ux.intel.com>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: Stable linux-5.10.x regression triggered by MDS mitigation

On Sun, Jun 23, 2024 at 02:34:12PM +0000, Jari Ruusu wrote:
> I have 32-bit x86 linux virtual machine running on
> QEMU-8.2.2+ds-0ubuntu1 with KVM acceleration. QEMU emulated
> CPU model is pentium2. Host is 64-bit linux running on Intel
> i5-7200U with latest microcode. Inside that 32-bit x86 linux
> VM I sometimes start dosemu to run some old MS-DOS programs.
> 
> Now dosemu fails to start with "Segmentation fault" error,
> and this shows up in dmesg output:
> 
> [   23.768348] general protection fault: 0000 [#1]
> [   23.768353] CPU: 0 PID: 1730 Comm: dosemu.bin Not tainted 5.10.214-test12345 #1
> [   23.768354] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> [   23.768358] EIP: restore_all_switch_stack+0xbd/0xc5
> [   23.768359] Code: 10 36 89 48 fc 8b 4c 24 0c 81 e1 ff ff 00 00 36 89 48 f8 8b 4c 24 08 36 89 48 f4 8b 4c 24 04 36 89 48 f0 59 8d 60 f0 58 89 f6 <0f> 00 2d 80 87 5c c1 cf fc 0f a0 50 b8 00 00 00 00 8e e0 8c d0 66
> [   23.768361] EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
> [   23.768362] ESI: 00000000 EDI: 00000000 EBP: 00000000 ESP: ffc03fdc
> [   23.768363] DS: 0000 ES: 0000 FS: 0000 GS: 0033 SS: 0068 EFLAGS: 00010046
> [   23.768366] CR0: 80050033 CR2: 00a33020 CR3: 0fc3b000 CR4: 00000290
> [   23.768368] Call Trace:
> [   23.768371]  ? show_regs+0x5d/0x60
> [   23.768373]  ? __die_body+0x10/0x43
> [   23.768374]  ? die_addr+0x27/0x3c
> [   23.768376]  ? exc_general_protection+0x1e6/0x239
> [   23.768378]  ? exc_bounds+0x8a/0x8a
> [   23.768379]  ? handle_exception+0x147/0x147
> [   23.768381]  ? exc_bounds+0x8a/0x8a
> [   23.768386]  ? restore_all_switch_stack+0xbd/0xc5
> [   23.768388]  ? exc_bounds+0x8a/0x8a
> [   23.768389]  ? restore_all_switch_stack+0xbd/0xc5
> [   23.768390] Modules linked in:
> [   23.768392] ---[ end trace 960c0712f12c2f48 ]---
> [   23.768394] EIP: restore_all_switch_stack+0xbd/0xc5
> [   23.768395] Code: 10 36 89 48 fc 8b 4c 24 0c 81 e1 ff ff 00 00 36 89 48 f8 8b 4c 24 08 36 89 48 f4 8b 4c 24 04 36 89 48 f0 59 8d 60 f0 58 89 f6 <0f> 00 2d 80 87 5c c1 cf fc 0f a0 50 b8 00 00 00 00 8e e0 8c d0 66
> [   23.768396] EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
> [   23.768397] ESI: 00000000 EDI: 00000000 EBP: 00000000 ESP: ffc03fdc
> [   23.768398] DS: 0000 ES: 0000 FS: 0000 GS: 0033 SS: 0068 EFLAGS: 00010046
> [   23.768400] CR0: 80050033 CR2: 00a33020 CR3: 0fc3b000 CR4: 00000290
> 
> Inside that 32-bit VM, kernel.org linux-5.10.214 and earlier
> 5.10.x versions start dosemu normally.
> 
> Inside that 32-bit VM, linux-5.10.215 and later 5.10.x
> versions fail to start dosemu.
> 
> Inside that 32-bit VM, linux-5.10.215 and later 5.10.x
> versions work if "mitigations=off" kernel parameter is
> added to the kernel running inside the VM.
> 
> I have narrowed down the problem so that linux-5.10.214 plus
> following 5 patches trigger that failure. Above dmesg output
> from "5.10.214-test12345" kernel includes following 5
> patches.
> 
> 
> 
> x86/asm: Add _ASM_RIP() macro for x86-64 (%rip) suffix
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=8b20c6f894b7d7d87d3aa1a85cbc7d57378e1346
> 
> x86/bugs: Add asm helpers for executing VERW
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=35e36eac881cddea42ca5fd93facc145a2d5369d
> 
> x86/entry_64: Add VERW just before userspace transition
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=edc702b4a820fc7ffc20e732db1c421cfffbb746
> 
> x86/entry_32: Add VERW just before userspace transition
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=50f021f0b985629accf10481a6e89af8b9700583
> 
> x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=6192d9ed311f70eb7e8ab4a874631a98c5a9217e
> 
> 
> 
> My reading of above dmesg output indicates that it is that
> VERW opcode that fails inside CLEAR_CPU_BUFFERS macro in
> arch/x86/include/asm/nospec-branch.h file. My reading of
> (possibly outdated) Intel Instruction Set Reference
> indicates this: "The VERR and VERW instructions are not
> recognized in virtual-8086 mode". My understanding is that
> dosemu uses virtual-8086 mode if that is available. Did
> above patch-set just kill virtual-8086 mode and dosemu
> permanently despite my 32-bit VM kernel .config having
> these?
> 
>  CONFIG_X86_LEGACY_VM86=y
>  CONFIG_VM86=y
> 
> I know that I am doing weird stuff and I understand that
> 32-bit linux and dosemu are probably something that most
> people don't care about, but this is still stable
> linux-5.10.x kernel regression that should be fixed.
> 
> Upstream Linux-6.10-rc4 seems to have similarly functioning
> CLEAR_CPU_BUFFERS macro with VERW opcode that is used in
> restore_all_switch_stack code path, so the same problem may
> well be with all newer kernels.
> 
> Pawan Gupta,
> Since you seem to be the author of above mentioned breakage,
> is there any chance of you sorting this out?

Below patch (for v6.10-rc5) should fix this. I didn't send this patch
earlier because I havn't got a chance to make sure if it will work for
other cases like modify_ldt().

---
From: Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>
Subject: [PATCH] x86/entry_32: Move CLEAR_CPU_BUFFERS before restoring
 segments

Robert Gill reported below #GP when dosemu software was executing vm86()
system call:

  general protection fault: 0000 [#1] PREEMPT SMP
  CPU: 4 PID: 4610 Comm: dosemu.bin Not tainted 6.6.21-gentoo-x86 #1
  Hardware name: Dell Inc. PowerEdge 1950/0H723K, BIOS 2.7.0 10/30/2010
  EIP: restore_all_switch_stack+0xbe/0xcf
  EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
  ESI: 00000000 EDI: 00000000 EBP: 00000000 ESP: ff8affdc
  DS: 0000 ES: 0000 FS: 0000 GS: 0033 SS: 0068 EFLAGS: 00010046
  CR0: 80050033 CR2: 00c2101c CR3: 04b6d000 CR4: 000406d0
  Call Trace:
   show_regs+0x70/0x78
   die_addr+0x29/0x70
   exc_general_protection+0x13c/0x348
   exc_bounds+0x98/0x98
   handle_exception+0x14d/0x14d
   exc_bounds+0x98/0x98
   restore_all_switch_stack+0xbe/0xcf
   exc_bounds+0x98/0x98
   restore_all_switch_stack+0xbe/0xcf

This only happens when VERW based mitigations like MDS, RFDS are enabled.
This is because segment registers can have funky values with vm86() that
can result in #GP when executing VERW. Intel SDM vol. 2C documents the
following behavior for VERW instruction:

  #GP(0) - If a memory operand effective address is outside the CS, DS, ES,
	   FS, or GS segment limit.

CLEAR_CPU_BUFFERS macro executes VERW instruction before returning to user
space. Add CLEAR_CPU_BUFFERS to the macro RESTORE_REGS before it restores
segment registers. In vm86 mode kernel does not support SYSCALL and
SYSENTER instructions, so the problem is only limited to int80 path in
32-bit mode. Leave the CLEAR_CPU_BUFFERS in the opportunistic SYSEXIT path
as it is.

Below are the locations where CLEAR_CPU_BUFFERS is currently being done.

* entry_INT80_32(), entry_SYSENTER_32() and interrupts (via
  handle_exception_return) do:

restore_all_switch_stack:
  [...]
  RESTORE_REGS pop=4 clear_cpu_buf=1
   pop    %ebx
   pop    %ecx
   pop    %edx
   pop    %esi
   pop    %edi
   pop    %ebp
   pop    %eax
   verw   0xc0fb0fc0       <-------------
   pop    %ds
   pop    %es
   pop    %fs

* Opportunistic SYSEXIT explicitly does CLEAR_CPU_BUFFERS:

   [...]
   pop    %eax
   verw   0xc0fb0fc0       <-------------
   sti
   sysexit

* NMIs use RESTORE_ALL_NMI -> RESTORE_REGS:

   nmi_return:
   [...]
   RESTORE_ALL_NMI cr3_reg=%edi
   jmp    0xc0fb22e0 <asm_exc_nmi+612>
   test   $0x1000,%edi
   je     0xc0fb22e0 <asm_exc_nmi+612>
   mov    %edi,%cr3
   pop    %ebx
   pop    %ecx
   pop    %edx
   pop    %esi
   pop    %edi
   pop    %ebp
   pop    %eax
   verw   0xc0fb0fc0      <-------------
   pop    %ds
   pop    %es
   pop    %fs

Fixes: a0e2dab44d22 ("x86/entry_32: Add VERW just before userspace transition")
Reported-by: Robert Gill <rtgill82@...il.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218707
Closes: https://lore.kernel.org/all/8c77ccfd-d561-45a1-8ed5-6b75212c7a58@leemhuis.info/
Suggested-by: Dave Hansen <dave.hansen@...ux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>
---
 arch/x86/entry/entry_32.S | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index d3a814efbff6..c963abc17a96 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -263,8 +263,15 @@
 	popl	%eax
 .endm
 
-.macro RESTORE_REGS pop=0
+.macro RESTORE_REGS pop=0 clear_cpu_buf=0
 	RESTORE_INT_REGS
+	/*
+	 * CLEAR_CPU_BUFFERS must be done before restoring segment
+	 * registers to avoid #GP when executing VERW in vm86 mode.
+	 */
+	.if \clear_cpu_buf
+	CLEAR_CPU_BUFFERS
+	.endif
 1:	popl	%ds
 2:	popl	%es
 3:	popl	%fs
@@ -299,7 +306,7 @@
 
 	BUG_IF_WRONG_CR3
 
-	RESTORE_REGS pop=\pop
+	RESTORE_REGS pop=\pop clear_cpu_buf=1
 .endm
 
 .macro CHECK_AND_APPLY_ESPFIX
@@ -950,8 +957,7 @@ restore_all_switch_stack:
 	BUG_IF_WRONG_CR3
 
 	/* Restore user state */
-	RESTORE_REGS pop=4			# skip orig_eax/error_code
-	CLEAR_CPU_BUFFERS
+	RESTORE_REGS pop=4 clear_cpu_buf=1	# skip orig_eax/error_code
 .Lirq_return:
 	/*
 	 * ARCH_HAS_MEMBARRIER_SYNC_CORE rely on IRET core serialization
@@ -1144,7 +1150,6 @@ SYM_CODE_START(asm_exc_nmi)
 
 	/* Not on SYSENTER stack. */
 	call	exc_nmi
-	CLEAR_CPU_BUFFERS
 	jmp	.Lnmi_return
 
 .Lnmi_from_sysenter_stack:
-- 
2.34.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ