[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8374a887-9bde-c7c0-ace2-0afe22f1f616@amd.com>
Date: Mon, 21 Jul 2025 09:50:47 -0500
From: Tom Lendacky <thomas.lendacky@....com>
To: Kai Huang <kai.huang@...el.com>, dave.hansen@...el.com, bp@...en8.de,
tglx@...utronix.de, peterz@...radead.org, mingo@...hat.com, hpa@...or.com
Cc: x86@...nel.org, kas@...nel.org, rick.p.edgecombe@...el.com,
dwmw@...zon.co.uk, linux-kernel@...r.kernel.org, pbonzini@...hat.com,
seanjc@...gle.com, kvm@...r.kernel.org, reinette.chatre@...el.com,
isaku.yamahata@...el.com, dan.j.williams@...el.com, ashish.kalra@....com,
nik.borisov@...e.com, chao.gao@...el.com, sagis@...gle.com
Subject: Re: [PATCH v4 0/7] TDX host: kexec/kdump support
On 7/21/25 08:08, Tom Lendacky wrote:
> On 7/17/25 16:46, Kai Huang wrote:
>> This series is the latest attempt to support kexec on TDX host following
>> Dave's suggestion to use a percpu boolean to control WBINVD during
>> kexec.
>>
>> Hi Boris/Tom,
>>
>> As requested, I added the first patch to cleanup the last two 'unsigned
>> int' parameters of the relocate_kernel() into one 'unsigned int' and pass
>> flags instead. The patch 2 (patch 1 in v3) also gets updated based on
>> that. Would you help to review? Thanks.
>>
>> I tested that both normal kexec and preserve_context kexec works (using
>> the tools/testing/selftests/kexec/test_kexec_jump.sh). But I don't have
>> SME capable machine to test.
>>
>> Hi Tom, I added your Reviewed-by and Tested-by in the patch 2 anyway
>> since I believe the change is trivial and straightforward). But due to
>> the cleanup patch, I appreciate if you can help to test the first two
>> patches again. Thanks a lot!
>
> Everything is working, Thanks!
See my comments in patch #1. I didn't test with context preservation, so
that bit was never set. If it was, I think things would have failed.
Thanks,
Tom
>
> Tom
>
>>
>> v3 -> v4:
>> - Rebase to latest tip/master.
>> - Add a cleanup patch to consolidate relocate_kernel()'s last two
>> function parameters -- Boris.
>> - Address comments received -- please see individual patches.
>> - Collect tags (Tom, Rick, binbin).
>>
>> v3: https://lore.kernel.org/kvm/cover.1750934177.git.kai.huang@intel.com/
>>
>> v2 -> v3 (all trivial changes):
>>
>> - Rebase on latest tip/master
>> - change to use __always_inline for do_seamcall() in patch 2
>> - Update patch 2 (changelog and code comment) to remove the sentence
>> which says "not all SEAMCALLs generate dirty cachelines of TDX
>> private memory but just treat all of them do." -- Dave.
>> - Add Farrah's Tested-by for all TDX patches.
>>
>> The v2 had one informal RFC patch appended to show "some optimization"
>> which can move WBINVD from the kexec phase to an early stage in KVM.
>> Paolo commented and Acked that patch (thanks!), so this v3 made that
>> patch as a formal one (patch 6). But technically it is not absolutely
>> needed in this series but can be done in the future.
>>
>> More history info can be found in v2:
>>
>> https://lore.kernel.org/lkml/cover.1746874095.git.kai.huang@intel.com/
>>
>> === More information ===
>>
>> TDX private memory is memory that is encrypted with private Host Key IDs
>> (HKID). If the kernel has ever enabled TDX, part of system memory
>> remains TDX private memory when kexec happens. E.g., the PAMT (Physical
>> Address Metadata Table) pages used by the TDX module to track each TDX
>> memory page's state are never freed once the TDX module is initialized.
>> TDX guests also have guest private memory and secure-EPT pages.
>>
>> After kexec, the new kernel will have no knowledge of which memory page
>> was used as TDX private page and can use all memory as regular memory.
>>
>> 1) Cache flush
>>
>> Per TDX 1.5 base spec "8.6.1.Platforms not Using ACT: Required Cache
>> Flush and Initialization by the Host VMM", to support kexec for TDX, the
>> kernel needs to flush cache to make sure there's no dirty cachelines of
>> TDX private memory left over to the new kernel (when the TDX module
>> reports TDX_FEATURES.CLFLUSH_BEFORE_ALLOC as 1 in the global metadata for
>> the platform). The kernel also needs to make sure there's no more TDX
>> activity (no SEAMCALL) after cache flush so that no new dirty cachelines
>> of TDX private memory are generated.
>>
>> SME has similar requirement. SME kexec support uses WBINVD to do the
>> cache flush. WBINVD is able to flush cachelines associated with any
>> HKID. Reuse the WBINVD introduced by SME to flush cache for TDX.
>>
>> Currently the kernel explicitly checks whether the hardware supports SME
>> and only does WBINVD if true. Instead of adding yet another TDX
>> specific check, this series uses a percpu boolean to indicate whether
>> WBINVD is needed on that CPU during kexec.
>>
>> 2) Reset TDX private memory using MOVDIR64B
>>
>> The TDX spec (the aforementioned section) also suggests the kernel
>> *should* use MOVDIR64B to clear TDX private page before the kernel
>> reuses it as regular one.
>>
>> However, in reality the situation can be more flexible. Per TDX 1.5
>> base spec ("Table 16.2: Non-ACT Platforms Checks on Memory Reads in Ci
>> Mode" and "Table 16.3: Non-ACT Platforms Checks on Memory Reads in Li
>> Mode"), the read/write to TDX private memory using shared KeyID without
>> integrity check enabled will not poison the memory and cause machine
>> check.
>>
>> Note on the platforms with ACT (Access Control Table), there's no
>> integrity check involved thus no machine check is possible to happen due
>> to memory read/write using different KeyIDs.
>>
>> KeyID 0 (TME key) doesn't support integrity check. This series chooses
>> to NOT reset TDX private memory but leave TDX private memory as-is to the
>> new kernel. As mentioned above, in practice it is safe to do so.
>>
>> 3) One limitation
>>
>> If the kernel has ever enabled TDX, after kexec the new kernel won't be
>> able to use TDX anymore. This is because when the new kernel tries to
>> initialize TDX module it will fail on the first SEAMCALL due to the
>> module has already been initialized by the old kernel.
>>
>> More (non-trivial) work will be needed for the new kernel to use TDX,
>> e.g., one solution is to just reload the TDX module from the location
>> where BIOS loads the TDX module (/boot/efi/EFI/TDX/). This series
>> doesn't cover this, but leave this as future work.
>>
>> 4) Kdump support
>>
>> This series also enables kdump with TDX, but no special handling is
>> needed for crash kexec (except turning on the Kconfig option):
>>
>> - kdump kernel uses reserved memory from the old kernel as system ram,
>> and the old kernel will never use the reserved memory as TDX memory.
>> - /proc/vmcore contains TDX private memory pages. It's meaningless to
>> read them, but it doesn't do any harm either.
>>
>> 5) TDX "partial write machine check" erratum
>>
>> On the platform with TDX erratum, a partial write (a write transaction
>> of less than a cacheline lands at memory controller) to TDX private
>> memory poisons that memory, and a subsequent read triggers machine
>> check. On those platforms, the kernel needs to reset TDX private memory
>> before jumping to the new kernel otherwise the new kernel may see
>> unexpected machine check.
>>
>> The kernel currently doesn't track which page is TDX private memory.
>> It's not trivial to reset TDX private memory. For simplicity, this
>> series simply disables kexec/kdump for such platforms. This can be
>> enhanced in the future.
>>
>>
>>
>> Kai Huang (7):
>> x86/kexec: Consolidate relocate_kernel() function parameters
>> x86/sme: Use percpu boolean to control WBINVD during kexec
>> x86/virt/tdx: Mark memory cache state incoherent when making SEAMCALL
>> x86/kexec: Disable kexec/kdump on platforms with TDX partial write
>> erratum
>> x86/virt/tdx: Remove the !KEXEC_CORE dependency
>> x86/virt/tdx: Update the kexec section in the TDX documentation
>> KVM: TDX: Explicitly do WBINVD when no more TDX SEAMCALLs
>>
>> Documentation/arch/x86/tdx.rst | 14 ++++-----
>> arch/x86/Kconfig | 1 -
>> arch/x86/include/asm/kexec.h | 12 ++++++--
>> arch/x86/include/asm/processor.h | 2 ++
>> arch/x86/include/asm/tdx.h | 31 +++++++++++++++++++-
>> arch/x86/kernel/cpu/amd.c | 17 +++++++++++
>> arch/x86/kernel/machine_kexec_64.c | 43 ++++++++++++++++++++++------
>> arch/x86/kernel/process.c | 24 +++++++---------
>> arch/x86/kernel/relocate_kernel_64.S | 30 +++++++++++--------
>> arch/x86/kvm/vmx/tdx.c | 12 ++++++++
>> arch/x86/virt/vmx/tdx/tdx.c | 16 +++++++++--
>> 11 files changed, 155 insertions(+), 47 deletions(-)
>>
>>
>> base-commit: e180b3a224cb519388c2f61ca7bc1eaf94cec1fb
Powered by blists - more mailing lists