lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 25 Jan 2018 13:07:43 +0100
From:   Borislav Petkov <bp@...en8.de>
To:     dwmw@...zon.co.uk, tim.c.chen@...ux.intel.com, pjt@...gle.com,
        jikos@...nel.org, gregkh@...ux-foundation.org,
        dave.hansen@...el.com, mingo@...nel.org, riel@...hat.com,
        luto@...capital.net, torvalds@...ux-foundation.org,
        ak@...ux.intel.com, keescook@...gle.com, jpoimboe@...hat.com,
        peterz@...radead.org, tglx@...utronix.de, hpa@...or.com,
        linux-kernel@...r.kernel.org
Cc:     linux-tip-commits@...r.kernel.org
Subject: Re: [tip:x86/pti] x86/retpoline: Fill return stack buffer on vmexit

On Fri, Jan 12, 2018 at 03:37:49AM -0800, tip-bot for David Woodhouse wrote:
> +/*
> + * On VMEXIT we must ensure that no RSB predictions learned in the guest
> + * can be followed in the host, by overwriting the RSB completely. Both
> + * retpoline and IBRS mitigations for Spectre v2 need this; only on future
> + * CPUs with IBRS_ATT *might* it be avoided.
> + */
> +static inline void vmexit_fill_RSB(void)
> +{
> +#ifdef CONFIG_RETPOLINE
> +	unsigned long loops = RSB_CLEAR_LOOPS / 2;
> +
> +	asm volatile (ANNOTATE_NOSPEC_ALTERNATIVE
> +		      ALTERNATIVE("jmp 910f",
> +				  __stringify(__FILL_RETURN_BUFFER(%0, RSB_CLEAR_LOOPS, %1)),
> +				  X86_FEATURE_RETPOLINE)
> +		      "910:"
> +		      : "=&r" (loops), ASM_CALL_CONSTRAINT
> +		      : "r" (loops) : "memory" );
> +#endif
> +}

Btw,

this thing is a real pain to backport to older kernels without breaking
kABI (I know, I know, latter sucks but it is what it is) so I'm thinking
we could simplify that thing regardless.

As it is, 41 bytes get replaced currently:

[    0.437005] apply_alternatives: feat: 7*32+12, old: (        (ptrval), len: 43), repl: (        (ptrval), len: 43), pad: 41
[    0.438885]         (ptrval): old_insn: eb 29 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
[    0.440001]         (ptrval): rpl_insn: 48 c7 c0 10 00 00 00 e8 07 00 00 00 f3 90 0f ae e8 eb f9 e8 07 00 00 00 f3 90 0f ae e8 eb f9 48 ff c8 75 e3 48 81 c4 00 01 00 00
[    0.444002]         (ptrval): final_insn: 48 c7 c0 10 00 00 00 e8 07 00 00 00 f3 90 0f ae e8 eb f9 e8 07 00 00 00 f3 90 0f ae e8 eb f9 48 ff c8 75 e3 48 81 c4 00 01 00 00

Not that it doesn't work but the less bytes we replace, the better.

So it could be made to simply call two functions. The replacing then
turns into:

[    0.438154] apply_alternatives: feat: 7*32+12, old: (ffffffff81060b60 len: 5), repl: (ffffffff82434692, len: 5), pad: 0
[    0.440002] ffffffff81060b60: old_insn: e8 ab 73 01 00
[    0.441003] ffffffff82434692: rpl_insn: e8 89 38 c4 fe
[    0.441966] apply_alternatives: Fix CALL offset: 0x173bb, CALL 0xffffffff81077f20
[    0.444002] ffffffff81060b60: final_insn: e8 bb 73 01 00

The "old_insn" above is:
ffffffff81060b60:       e8 ab 73 01 00          callq  ffffffff81077f10 <__fill_rsb_nop>

and final_insn is
ffffffff82434692:       e8 89 38 c4 fe          callq  ffffffff81077f20 <__fill_rsb>

so both CALLs with immediates for which there's no speculation going on.

I had to add a new alternative_void_call() macro for that but that's
straight-forward. Also, this causes objtool to segfault so Josh and I
need to look at that first.

Other than that, it gets a lot simpler this way:

--
diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index cf5961ca8677..a84863c1b7d3 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -210,6 +210,11 @@ static inline int alternatives_text_reserved(void *start, void *end)
 	asm volatile (ALTERNATIVE("call %P[old]", "call %P[new]", feature) \
 		: output : [old] "i" (oldfunc), [new] "i" (newfunc), ## input)
 
+/* Like alternative_io, but for replacing a direct call with another one. */
+#define alternative_void_call(oldfunc, newfunc, feature, input...)		\
+	asm volatile (ALTERNATIVE("call %P[old]", "call %P[new]", feature)	\
+		: : [old] "i" (oldfunc), [new] "i" (newfunc), ## input)
+
 /*
  * Like alternative_call, but there are two features and respective functions.
  * If CPU has feature2, function2 is used.
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 4ad41087ce0e..5ba951c208af 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 
-#ifndef __NOSPEC_BRANCH_H__
-#define __NOSPEC_BRANCH_H__
+#ifndef __X86_ASM_NOSPEC_BRANCH_H__
+#define __X86_ASM_NOSPEC_BRANCH_H__
 
 #include <asm/alternative.h>
 #include <asm/alternative-asm.h>
@@ -206,17 +206,9 @@ extern char __indirect_thunk_end[];
 static inline void vmexit_fill_RSB(void)
 {
 #ifdef CONFIG_RETPOLINE
-	unsigned long loops;
-
-	asm volatile (ANNOTATE_NOSPEC_ALTERNATIVE
-		      ALTERNATIVE("jmp 910f",
-				  __stringify(__FILL_RETURN_BUFFER(%0, RSB_CLEAR_LOOPS, %1)),
-				  X86_FEATURE_RETPOLINE)
-		      "910:"
-		      : "=r" (loops), ASM_CALL_CONSTRAINT
-		      : : "memory" );
+	alternative_void_call(__fill_rsb_nop, __fill_rsb, X86_FEATURE_RETPOLINE, ASM_NO_INPUT_CLOBBER("memory"));
 #endif
 }
 
 #endif /* __ASSEMBLY__ */
-#endif /* __NOSPEC_BRANCH_H__ */
+#endif /* __X86_ASM_NOSPEC_BRANCH_H__ */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index d3a67fba200a..b6a002bf893b 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -971,4 +971,9 @@ bool xen_set_default_idle(void);
 
 void stop_this_cpu(void *dummy);
 void df_debug(struct pt_regs *regs, long error_code);
+
+#ifdef CONFIG_RETPOLINE
+void __fill_rsb_nop(void);
+void __fill_rsb(void);
+#endif
 #endif /* _ASM_X86_PROCESSOR_H */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 390b3dc3d438..16cc2e73d17d 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -281,3 +281,19 @@ ssize_t cpu_show_spectre_v2(struct device *dev,
 	return sprintf(buf, "%s\n", spectre_v2_strings[spectre_v2_enabled]);
 }
 #endif
+
+#ifdef CONFIG_RETPOLINE
+void __fill_rsb_nop(void)
+{
+	cpu_relax();
+}
+
+void __fill_rsb(void)
+{
+	unsigned long loops;
+
+	asm volatile (__stringify(__FILL_RETURN_BUFFER(%0, RSB_CLEAR_LOOPS, %1))
+		      : "=r" (loops), ASM_CALL_CONSTRAINT
+		      : : "memory" );
+}
+#endif

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ