linux-kernel - [PATCH v11 04/20] x86/cpu: Detect TDX partial write machine check erratum

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <86f2a8814240f4bbe850f6a09fc9d0b934979d1b.1685887183.git.kai.huang@intel.com>
Date:   Mon,  5 Jun 2023 02:27:17 +1200
From:   Kai Huang <kai.huang@...el.com>
To:     linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Cc:     linux-mm@...ck.org, dave.hansen@...el.com,
        kirill.shutemov@...ux.intel.com, tony.luck@...el.com,
        peterz@...radead.org, tglx@...utronix.de, seanjc@...gle.com,
        pbonzini@...hat.com, david@...hat.com, dan.j.williams@...el.com,
        rafael.j.wysocki@...el.com, ying.huang@...el.com,
        reinette.chatre@...el.com, len.brown@...el.com, ak@...ux.intel.com,
        isaku.yamahata@...el.com, chao.gao@...el.com,
        sathyanarayanan.kuppuswamy@...ux.intel.com, bagasdotme@...il.com,
        sagis@...gle.com, imammedo@...hat.com, kai.huang@...el.com
Subject: [PATCH v11 04/20] x86/cpu: Detect TDX partial write machine check erratum

TDX memory has integrity and confidentiality protections.  Violations of
this integrity protection are supposed to only affect TDX operations and
are never supposed to affect the host kernel itself.  In other words,
the host kernel should never, itself, see machine checks induced by the
TDX integrity hardware.

Alas, the first few generations of TDX hardware have an erratum.  A
"partial" write to a TDX private memory cacheline will silently "poison"
the line.  Subsequent reads will consume the poison and generate a
machine check.  According to the TDX hardware spec, neither of these
things should have happened.

Virtually all kernel memory accesses operations happen in full
cachelines.  In practice, writing a "byte" of memory usually reads a 64
byte cacheline of memory, modifies it, then writes the whole line back.
Those operations do not trigger this problem.

This problem is triggered by "partial" writes where a write transaction
of less than cacheline lands at the memory controller.  The CPU does
these via non-temporal write instructions (like MOVNTI), or through
UC/WC memory mappings.  The issue can also be triggered away from the
CPU by devices doing partial writes via DMA.

With this erratum, there are additional things need to be done around
machine check handler and kexec(), etc.  Similar to other CPU bugs, use
a CPU bug bit to indicate this erratum, and detect this erratum during
early boot.  Note this bug reflects the hardware thus it is detected
regardless of whether the kernel is built with TDX support or not.

Signed-off-by: Kai Huang <kai.huang@...el.com>
---

v10 -> v11:
 - New patch

---
 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/kernel/cpu/intel.c        | 21 +++++++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index cb8ca46213be..dc8701f8d88b 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -483,5 +483,6 @@
 #define X86_BUG_RETBLEED		X86_BUG(27) /* CPU is affected by RETBleed */
 #define X86_BUG_EIBRS_PBRSB		X86_BUG(28) /* EIBRS is vulnerable to Post Barrier RSB Predictions */
 #define X86_BUG_SMT_RSB			X86_BUG(29) /* CPU is vulnerable to Cross-Thread Return Address Predictions */
+#define X86_BUG_TDX_PW_MCE		X86_BUG(30) /* CPU may incur #MC if non-TD software does partial write to TDX private memory */

 #endif /* _ASM_X86_CPUFEATURES_H */
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 1c4639588ff9..251b333e53d2 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1552,3 +1552,24 @@ u8 get_this_hybrid_cpu_type(void)

 	return cpuid_eax(0x0000001a) >> X86_HYBRID_CPU_TYPE_ID_SHIFT;
 }
+
+/*
+ * These CPUs have an erratum.  A partial write from non-TD
+ * software (e.g. via MOVNTI variants or UC/WC mapping) to TDX
+ * private memory poisons that memory, and a subsequent read of
+ * that memory triggers #MC.
+ */
+static const struct x86_cpu_id tdx_pw_mce_cpu_ids[] __initconst = {
+	X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, NULL),
+	X86_MATCH_INTEL_FAM6_MODEL(EMERALDRAPIDS_X, NULL),
+	{ }
+};
+
+static int __init tdx_erratum_detect(void)
+{
+	if (x86_match_cpu(tdx_pw_mce_cpu_ids))
+		setup_force_cpu_bug(X86_BUG_TDX_PW_MCE);
+
+	return 0;
+}
+early_initcall(tdx_erratum_detect);
-- 
2.40.1