linux-kernel - Re: [PATCH v5 25/28] x86/fpu/xstate: Skip writing zeros to signal frame for dynamic user states if in INIT-state

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6197fd94-76a9-a391-f290-7001a71add7f@kernel.org>
Date:   Sun, 23 May 2021 20:25:35 -0700
From:   Andy Lutomirski <luto@...nel.org>
To:     "Chang S. Bae" <chang.seok.bae@...el.com>, bp@...e.de,
        tglx@...utronix.de, mingo@...nel.org, x86@...nel.org
Cc:     len.brown@...el.com, dave.hansen@...el.com, jing2.liu@...el.com,
        ravi.v.shankar@...el.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 25/28] x86/fpu/xstate: Skip writing zeros to signal
 frame for dynamic user states if in INIT-state

On 5/23/21 12:32 PM, Chang S. Bae wrote:
> By default, for xstate features in the INIT-state, XSAVE writes zeros to
> the uncompressed destination buffer.
> 
> E.g., if you are not using AVX-512, you will still get a bunch of zeros on
> the signal stack where live AVX-512 data would go.
> 
> For 'dynamic user state' (currently only XTILEDATA), explicitly skip this
> data transfer. The result is that the user buffer for the AMX region will
> not be touched by XSAVE.

Why?

> 
> Signed-off-by: Chang S. Bae <chang.seok.bae@...el.com>
> Reviewed-by: Len Brown <len.brown@...el.com>
> Cc: x86@...nel.org
> Cc: linux-kernel@...r.kernel.org
> ---
> Changes from v4:
> * Added as a new patch.
> ---
>  arch/x86/include/asm/fpu/internal.h | 22 +++++++++++++++++++---
>  1 file changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
> index 4a3436684805..131f2557fc85 100644
> --- a/arch/x86/include/asm/fpu/internal.h
> +++ b/arch/x86/include/asm/fpu/internal.h
> @@ -354,11 +354,27 @@ static inline void copy_kernel_to_xregs(struct xregs_state *xstate, u64 mask)
>   */
>  static inline int copy_xregs_to_user(struct xregs_state __user *buf)
>  {
> -	u64 mask = current->thread.fpu.state_mask;
> -	u32 lmask = mask;
> -	u32 hmask = mask >> 32;
> +	u64 state_mask = current->thread.fpu.state_mask;
> +	u64 dynamic_state_mask;
> +	u32 lmask, hmask;
>  	int err;
>  
> +	dynamic_state_mask = state_mask & xfeatures_mask_user_dynamic;
> +	if (dynamic_state_mask && boot_cpu_has(X86_FEATURE_XGETBV1)) {
> +		u64 dynamic_xinuse, dynamic_init;
> +		u64 xinuse = xgetbv(1);
> +
> +		dynamic_xinuse = xinuse & dynamic_state_mask;
> +		dynamic_init = ~(xinuse) & dynamic_state_mask;
> +		if (dynamic_init) {
> +			state_mask &= ~xfeatures_mask_user_dynamic;
> +			state_mask |= dynamic_xinuse;

That's a long-winded way to say:

state_mask &= ~dynamic_init;

But what happens if we don't have the XGETBV1 feature?  Are we making
AMX support depend on XGETBV1?

How does this patch interact with "[PATCH v5 24/28] x86/fpu/xstate: Use
per-task xstate mask for saving xstate in signal frame"?  They seem to
be try to do something similar but not quite the same, and they seem to
be patching the same function.  The result seems odd.

Finally, isn't part of the point that we need to avoid even *allocating*
space for non-AMX-using tasks?  That would require writing out the
compacted format and/or fiddling with XCR0.