[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1429792491-5978-1-git-send-email-dvlasenk@redhat.com>
Date: Thu, 23 Apr 2015 14:34:51 +0200
From: Denys Vlasenko <dvlasenk@...hat.com>
To: Ingo Molnar <mingo@...nel.org>
Cc: Denys Vlasenko <dvlasenk@...hat.com>,
Brian Gerst <brgerst@...il.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Steven Rostedt <rostedt@...dmis.org>,
Borislav Petkov <bp@...en8.de>,
"H. Peter Anvin" <hpa@...or.com>,
Andy Lutomirski <luto@...capital.net>,
Oleg Nesterov <oleg@...hat.com>,
Frederic Weisbecker <fweisbec@...il.com>,
Alexei Starovoitov <ast@...mgrid.com>,
Will Drewry <wad@...omium.org>,
Kees Cook <keescook@...omium.org>, x86@...nel.org,
linux-kernel@...r.kernel.org
Subject: [PATCH] x86/asm/entry/32: Restore %ss before SYSRETL if necessary
AMD docs say that SYSRET32 loads %ss selector with a value from a MSR,
but *cached descriptor* of %ss is not modified.
(Intel CPUs reset the descriptor to a fixed, valid state).
It was observed to cause Wine crashes. Conjectured sequence of events
causing it is as follows:
1. Wine process enters kernel via syscall insn.
2. Context switch to any other task.
3. Interrupt or exception happens, CPU loads %ss with 0.
(This happens according to both Intel and AMD docs.)
%ss cached descriptor is set to "invalid" state.
4. Context switch back to Wine.
5. sysret to 32-bit userspace. %ss selector has correct value but its
cached descriptor is still invalid.
6. The very first userspace POP insn after this causes exception 12.
Fix this by checking %ss selector value. If it is not __KERNEL_DS,
(and it really can only be __KERNEL_DS or zero),
then load it with __KERNEL_DS.
We also use SYSRET32 for SYSENTER-based syscalls, but that codepath is
only used by Intel CPUs, which don't have this quirk.
Signed-off-by: Denys Vlasenko <dvlasenk@...hat.com>
Reported-by: Brian Gerst <brgerst@...il.com>
CC: Brian Gerst <brgerst@...il.com>
CC: Linus Torvalds <torvalds@...ux-foundation.org>
CC: Steven Rostedt <rostedt@...dmis.org>
CC: Ingo Molnar <mingo@...nel.org>
CC: Borislav Petkov <bp@...en8.de>
CC: "H. Peter Anvin" <hpa@...or.com>
CC: Andy Lutomirski <luto@...capital.net>
CC: Oleg Nesterov <oleg@...hat.com>
CC: Frederic Weisbecker <fweisbec@...il.com>
CC: Alexei Starovoitov <ast@...mgrid.com>
CC: Will Drewry <wad@...omium.org>
CC: Kees Cook <keescook@...omium.org>
CC: x86@...nel.org
CC: linux-kernel@...r.kernel.org
---
arch/x86/ia32/ia32entry.S | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 0c302d0..9537dcb 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -408,6 +408,18 @@ cstar_dispatch:
sysretl_from_sys_call:
andl $~TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
RESTORE_RSI_RDI_RDX
+ /*
+ * On AMD, SYSRET32 loads %ss selector, but does not modify its
+ * cached descriptor; and in kernel, %ss can be loaded with 0,
+ * setting cached descriptor to "invalid". This has no effect on
+ * 64-bit mode, but on return to 32-bit mode, it makes stack ops fail.
+ * Fix %ss only if it's wrong: read from %ss takes ~2 cycles,
+ * write to %ss is ~40 cycles.
+ */
+ movl %ss, %ecx
+ cmpl $__KERNEL_DS, %ecx
+ jne reload_ss
+ss_is_good:
movl RIP(%rsp),%ecx
CFI_REGISTER rip,rcx
movl EFLAGS(%rsp),%r11d
@@ -426,6 +438,10 @@ sysretl_from_sys_call:
* does not exist, it merely sets eflags.IF=1).
*/
USERGS_SYSRET32
+reload_ss:
+ movl $__KERNEL_DS, %ecx
+ movl %ecx, %ss
+ jmp ss_is_good
#ifdef CONFIG_AUDITSYSCALL
cstar_auditsys:
--
1.8.1.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists