lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <ee5185e1727c3cd8bd51dbf9fcec95d432100d12.1670566861.git.kai.huang@intel.com>
Date:   Fri,  9 Dec 2022 19:52:36 +1300
From:   Kai Huang <kai.huang@...el.com>
To:     linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Cc:     linux-mm@...ck.org, dave.hansen@...el.com, peterz@...radead.org,
        tglx@...utronix.de, seanjc@...gle.com, pbonzini@...hat.com,
        dan.j.williams@...el.com, rafael.j.wysocki@...el.com,
        kirill.shutemov@...ux.intel.com, ying.huang@...el.com,
        reinette.chatre@...el.com, len.brown@...el.com,
        tony.luck@...el.com, ak@...ux.intel.com, isaku.yamahata@...el.com,
        chao.gao@...el.com, sathyanarayanan.kuppuswamy@...ux.intel.com,
        bagasdotme@...il.com, sagis@...gle.com, imammedo@...hat.com,
        kai.huang@...el.com
Subject: [PATCH v8 15/16] x86/virt/tdx: Flush cache in kexec() when TDX is enabled

There are two problems in terms of using kexec() to boot to a new kernel
when the old kernel has enabled TDX: 1) Part of the memory pages are
still TDX private pages (i.e. metadata used by the TDX module, and any
TDX guest memory if kexec() happens when there's any TDX guest alive).
2) There might be dirty cachelines associated with TDX private pages.

Because the hardware doesn't guarantee cache coherency among different
KeyIDs, the old kernel needs to flush cache (of those TDX private pages)
before booting to the new kernel.  Also, reading TDX private page using
any shared non-TDX KeyID with integrity-check enabled can trigger #MC.
Therefore ideally, the kernel should convert all TDX private pages back
to normal before booting to the new kernel.

However, this implementation doesn't convert TDX private pages back to
normal in kexec() because of below considerations:

1) Neither the kernel nor the TDX module has existing infrastructure to
   track which pages are TDX private pages.
2) The number of TDX private pages can be large, and converting all of
   them (cache flush + using MOVDIR64B to clear the page) in kexec() can
   be time consuming.
3) The new kernel will almost only use KeyID 0 to access memory.  KeyID
   0 doesn't support integrity-check, so it's OK.
4) The kernel doesn't (and may never) support MKTME.  If any 3rd party
   kernel ever supports MKTME, it can/should do MOVDIR64B to clear the
   page with the new MKTME KeyID (just like TDX does) before using it.

Therefore, this implementation just flushes cache to make sure there are
no stale dirty cachelines associated with any TDX private KeyIDs before
booting to the new kernel, otherwise they may silently corrupt the new
kernel.

Following SME support, use wbinvd() to flush cache in stop_this_cpu().
Theoretically, cache flush is only needed when the TDX module has been
initialized.  However initializing the TDX module is done on demand at
runtime, and it takes a mutex to read the module status.  Just check
whether TDX is enabled by BIOS instead to flush cache.

Reviewed-by: Isaku Yamahata <isaku.yamahata@...el.com>
Signed-off-by: Kai Huang <kai.huang@...el.com>
---

v7 -> v8:
 - Changelog:
   - Removed "leave TDX module open" part due to shut down patch has been
     removed.

v6 -> v7:
 - Improved changelog to explain why don't convert TDX private pages back
   to normal.

---
 arch/x86/kernel/process.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index c21b7347a26d..0cc84977dc62 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -765,8 +765,14 @@ void __noreturn stop_this_cpu(void *dummy)
 	 *
 	 * Test the CPUID bit directly because the machine might've cleared
 	 * X86_FEATURE_SME due to cmdline options.
+	 *
+	 * Similar to SME, if the TDX module is ever initialized, the
+	 * cachelines associated with any TDX private KeyID must be flushed
+	 * before transiting to the new kernel.  The TDX module is initialized
+	 * on demand, and it takes the mutex to read its status.  Just check
+	 * whether TDX is enabled by BIOS instead to flush cache.
 	 */
-	if (cpuid_eax(0x8000001f) & BIT(0))
+	if (cpuid_eax(0x8000001f) & BIT(0) || platform_tdx_enabled())
 		native_wbinvd();
 	for (;;) {
 		/*
-- 
2.38.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ