lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20211001223728.9309-24-chang.seok.bae@intel.com>
Date:   Fri,  1 Oct 2021 15:37:22 -0700
From:   "Chang S. Bae" <chang.seok.bae@...el.com>
To:     bp@...e.de, luto@...nel.org, tglx@...utronix.de, mingo@...nel.org,
        x86@...nel.org
Cc:     len.brown@...el.com, lenb@...nel.org, dave.hansen@...el.com,
        thiago.macieira@...el.com, jing2.liu@...el.com,
        ravi.v.shankar@...el.com, linux-kernel@...r.kernel.org,
        chang.seok.bae@...el.com
Subject: [PATCH v11 23/29] x86/fpu/xstate: Skip writing zeros to signal frame for dynamic user states if in INIT-state

By default, for XSTATE features in the INIT-state, XSAVE writes zeros to
the uncompressed destination buffer.

E.g., if you are not using AVX-512, you will still get a bunch of zeros on
the signal stack where live AVX-512 data would go.

For permssion-required states (currently AMX state), explicitly skip this
data transfer. The result is that the user buffer for the AMX region will
not be touched by XSAVE.

[ Reading XINUSE takes about 20-30 cycles, but writing zeros consumes about
  5-times or more, e.g., for XTILEDATA. ]

Signed-off-by: Chang S. Bae <chang.seok.bae@...el.com>
Reviewed-by: Len Brown <len.brown@...el.com>
Cc: x86@...nel.org
Cc: linux-kernel@...r.kernel.org
---
Changes from v10:
* Simplify the sigframe XSAVE code: replace check for XFD STATE with
  XTILECFG and later STATE.

Changes from v9:
* Use cpu_feature_enabled() instead of boot_cpu_has(). (Borislav Petkov)

Changes from v5:
* Mentioned the optimization trade-offs in the changelog. (Dave Hansen)
* Added code comment.

Changes from v4:
* Added as a new patch.
---
 arch/x86/include/asm/fpu/internal.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 06be4c247c97..5f013fa0b205 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -355,8 +355,12 @@ static inline int xsave_to_user_sigframe(struct xregs_state __user *buf)
 		mask = uabi_mask & ~xfeatures_mask_user_perm();
 
 		if (sig_xstate_expanded(current)) {
-			u64 cur_uabi_mask = uabi_mask & current->thread.fpu.state_mask;
+			u64 cur_uabi_mask;
 
+			if (cpu_feature_enabled(X86_FEATURE_XGETBV1))
+				cur_uabi_mask = uabi_mask & xgetbv(1);
+			else
+				cur_uabi_mask = uabi_mask & current->thread.fpu.state_mask;
 			mask |= cur_uabi_mask & xfeatures_mask_user_perm();
 		}
 	}
-- 
2.17.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ