lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 25 Jan 2018 12:20:49 +0000
From:   David Woodhouse <dwmw2@...radead.org>
To:     Borislav Petkov <bp@...en8.de>, tim.c.chen@...ux.intel.com,
        pjt@...gle.com, jikos@...nel.org, gregkh@...ux-foundation.org,
        dave.hansen@...el.com, mingo@...nel.org, riel@...hat.com,
        luto@...capital.net, torvalds@...ux-foundation.org,
        ak@...ux.intel.com, keescook@...gle.com, jpoimboe@...hat.com,
        peterz@...radead.org, tglx@...utronix.de, hpa@...or.com,
        linux-kernel@...r.kernel.org
Cc:     linux-tip-commits@...r.kernel.org
Subject: Re: [tip:x86/pti] x86/retpoline: Fill return stack buffer on vmexit

On Thu, 2018-01-25 at 13:07 +0100, Borislav Petkov wrote:
> On Fri, Jan 12, 2018 at 03:37:49AM -0800, tip-bot for David Woodhouse wrote:
> > 
> > +/*
> > + * On VMEXIT we must ensure that no RSB predictions learned in the guest
> > + * can be followed in the host, by overwriting the RSB completely. Both
> > + * retpoline and IBRS mitigations for Spectre v2 need this; only on future
> > + * CPUs with IBRS_ATT *might* it be avoided.
> > + */
> > +static inline void vmexit_fill_RSB(void)
> > +{
> > +#ifdef CONFIG_RETPOLINE
> > +	unsigned long loops = RSB_CLEAR_LOOPS / 2;
> > +
> > +	asm volatile (ANNOTATE_NOSPEC_ALTERNATIVE
> > +		      ALTERNATIVE("jmp 910f",
> > +				  __stringify(__FILL_RETURN_BUFFER(%0, RSB_CLEAR_LOOPS, %1)),
> > +				  X86_FEATURE_RETPOLINE)
> > +		      "910:"
> > +		      : "=&r" (loops), ASM_CALL_CONSTRAINT
> > +		      : "r" (loops) : "memory" );
> > +#endif
> > +}
> Btw,
> 
> this thing is a real pain to backport to older kernels without breaking
> kABI (I know, I know, latter sucks but it is what it is) 

I haven't had lunch yet, so I don't feel queasy and I'm vaguely
interested... *why* does it break kABI?

> so I'm thinking we could simplify that thing regardless.
> 
> As it is, 41 bytes get replaced currently:
> 
> [    0.437005] apply_alternatives: feat: 7*32+12, old: (        (ptrval), len: 43), repl: (        (ptrval), len: 43), pad: 41
> [    0.438885]         (ptrval): old_insn: eb 29 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
> [    0.440001]         (ptrval): rpl_insn: 48 c7 c0 10 00 00 00 e8 07 00 00 00 f3 90 0f ae e8 eb f9 e8 07 00 00 00 f3 90 0f ae e8 eb f9 48 ff c8 75 e3 48 81 c4 00 01 00 00
> [    0.444002]         (ptrval): final_insn: 48 c7 c0 10 00 00 00 e8 07 00 00 00 f3 90 0f ae e8 eb f9 e8 07 00 00 00 f3 90 0f ae e8 eb f9 48 ff c8 75 e3 48 81 c4 00 01 00 00
> 
> Not that it doesn't work but the less bytes we replace, the better.
> 
> So it could be made to simply call two functions. The replacing then
> turns into:
> 
> [    0.438154] apply_alternatives: feat: 7*32+12, old: (ffffffff81060b60 len: 5), repl: (ffffffff82434692, len: 5), pad: 0
> [    0.440002] ffffffff81060b60: old_insn: e8 ab 73 01 00
> [    0.441003] ffffffff82434692: rpl_insn: e8 89 38 c4 fe
> [    0.441966] apply_alternatives: Fix CALL offset: 0x173bb, CALL 0xffffffff81077f20
> [    0.444002] ffffffff81060b60: final_insn: e8 bb 73 01 00
> 
> The "old_insn" above is:
> ffffffff81060b60:       e8 ab 73 01
> 00          callq  ffffffff81077f10 <__fill_rsb_nop>
> 
> and final_insn is
> ffffffff82434692:       e8 89 38 c4
> fe          callq  ffffffff81077f20 <__fill_rsb>
> 
> so both CALLs with immediates for which there's no speculation going
> on.
> 
> I had to add a new alternative_void_call() macro for that but that's
> straight-forward. Also, this causes objtool to segfault so Josh and I
> need to look at that first.
> 
> Other than that, it gets a lot simpler this way:
> 
> --
> diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
> index cf5961ca8677..a84863c1b7d3 100644
> --- a/arch/x86/include/asm/alternative.h
> +++ b/arch/x86/include/asm/alternative.h
> @@ -210,6 +210,11 @@ static inline int alternatives_text_reserved(void *start, void *end)
>  	asm volatile (ALTERNATIVE("call %P[old]", "call %P[new]", feature) \
>  		: output : [old] "i" (oldfunc), [new] "i" (newfunc), ## input)
>  
> +/* Like alternative_io, but for replacing a direct call with another one. */
> +#define alternative_void_call(oldfunc, newfunc, feature, input...)		\
> +	asm volatile (ALTERNATIVE("call %P[old]", "call %P[new]", feature)	\
> +		: : [old] "i" (oldfunc), [new] "i" (newfunc), ## input)

But you aren't doing the call at all in the other case, and
alternatives *always* handled the case where the first 'alternative'
instruction was a branch, didn't it?

So couldn't it just be alternative(nop, call __fill_rsb_func)?

But I still don't understand why it matters.

> +void __fill_rsb(void)
> +{
> +	unsigned long loops;
> +
> +	asm volatile (__stringify(__FILL_RETURN_BUFFER(%0, RSB_CLEAR_LOOPS, %1))
> +		      : "=r" (loops), ASM_CALL_CONSTRAINT
> +		      : : "memory" );
> +}
> +#endif

The out-of-line function should be __clear_rsb() if it's using
RSB_CLEAR_LOOPS, and __fill_rsb() if it's using RSB_FILL_LOOPS. I think
we're only using one right now but Andi at least is posting patches
which use the other, as part of the Skylake clusterfuck.
Download attachment "smime.p7s" of type "application/x-pkcs7-signature" (5213 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ