lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f999349e-accb-dcd6-75f4-eb36e0dda79f@amd.com>
Date: Mon, 21 Jul 2025 08:08:51 -0500
From: Tom Lendacky <thomas.lendacky@....com>
To: Kai Huang <kai.huang@...el.com>, dave.hansen@...el.com, bp@...en8.de,
 tglx@...utronix.de, peterz@...radead.org, mingo@...hat.com, hpa@...or.com
Cc: x86@...nel.org, kas@...nel.org, rick.p.edgecombe@...el.com,
 dwmw@...zon.co.uk, linux-kernel@...r.kernel.org, pbonzini@...hat.com,
 seanjc@...gle.com, kvm@...r.kernel.org, reinette.chatre@...el.com,
 isaku.yamahata@...el.com, dan.j.williams@...el.com, ashish.kalra@....com,
 nik.borisov@...e.com, chao.gao@...el.com, sagis@...gle.com
Subject: Re: [PATCH v4 0/7] TDX host: kexec/kdump support

On 7/17/25 16:46, Kai Huang wrote:
> This series is the latest attempt to support kexec on TDX host following
> Dave's suggestion to use a percpu boolean to control WBINVD during
> kexec.
> 
> Hi Boris/Tom,
> 
> As requested, I added the first patch to cleanup the last two 'unsigned
> int' parameters of the relocate_kernel() into one 'unsigned int' and pass
> flags instead.  The patch 2 (patch 1 in v3) also gets updated based on
> that.  Would you help to review?  Thanks.
> 
> I tested that both normal kexec and preserve_context kexec works (using
> the tools/testing/selftests/kexec/test_kexec_jump.sh).  But I don't have
> SME capable machine to test.
> 
> Hi Tom, I added your Reviewed-by and Tested-by in the patch 2 anyway
> since I believe the change is trivial and straightforward).  But due to
> the cleanup patch, I appreciate if you can help to test the first two
> patches again.  Thanks a lot!

Everything is working, Thanks!

Tom

> 
> v3 -> v4:
>  - Rebase to latest tip/master.
>  - Add a cleanup patch to consolidate relocate_kernel()'s last two
>    function parameters -- Boris.
>  - Address comments received -- please see individual patches.
>  - Collect tags (Tom, Rick, binbin).
> 
>  v3: https://lore.kernel.org/kvm/cover.1750934177.git.kai.huang@intel.com/
> 
> v2 -> v3 (all trivial changes):
> 
>  - Rebase on latest tip/master
>    - change to use __always_inline for do_seamcall() in patch 2
>  - Update patch 2 (changelog and code comment) to remove the sentence
>    which says "not all SEAMCALLs generate dirty cachelines of TDX
>    private memory but just treat all of them do."  -- Dave.
>  - Add Farrah's Tested-by for all TDX patches.
> 
> The v2 had one informal RFC patch appended to show "some optimization"
> which can move WBINVD from the kexec phase to an early stage in KVM.
> Paolo commented and Acked that patch (thanks!), so this v3 made that
> patch as a formal one (patch 6).  But technically it is not absolutely
> needed in this series but can be done in the future.
> 
> More history info can be found in v2:
> 
>  https://lore.kernel.org/lkml/cover.1746874095.git.kai.huang@intel.com/
> 
> === More information ===
> 
> TDX private memory is memory that is encrypted with private Host Key IDs
> (HKID).  If the kernel has ever enabled TDX, part of system memory
> remains TDX private memory when kexec happens.  E.g., the PAMT (Physical
> Address Metadata Table) pages used by the TDX module to track each TDX
> memory page's state are never freed once the TDX module is initialized.
> TDX guests also have guest private memory and secure-EPT pages.
> 
> After kexec, the new kernel will have no knowledge of which memory page
> was used as TDX private page and can use all memory as regular memory.
> 
> 1) Cache flush
> 
> Per TDX 1.5 base spec "8.6.1.Platforms not Using ACT: Required Cache
> Flush and Initialization by the Host VMM", to support kexec for TDX, the
> kernel needs to flush cache to make sure there's no dirty cachelines of
> TDX private memory left over to the new kernel (when the TDX module
> reports TDX_FEATURES.CLFLUSH_BEFORE_ALLOC as 1 in the global metadata for
> the platform).  The kernel also needs to make sure there's no more TDX
> activity (no SEAMCALL) after cache flush so that no new dirty cachelines
> of TDX private memory are generated.
> 
> SME has similar requirement.  SME kexec support uses WBINVD to do the
> cache flush.  WBINVD is able to flush cachelines associated with any
> HKID.  Reuse the WBINVD introduced by SME to flush cache for TDX.
> 
> Currently the kernel explicitly checks whether the hardware supports SME
> and only does WBINVD if true.  Instead of adding yet another TDX
> specific check, this series uses a percpu boolean to indicate whether
> WBINVD is needed on that CPU during kexec.
> 
> 2) Reset TDX private memory using MOVDIR64B
> 
> The TDX spec (the aforementioned section) also suggests the kernel
> *should* use MOVDIR64B to clear TDX private page before the kernel
> reuses it as regular one.
> 
> However, in reality the situation can be more flexible.  Per TDX 1.5
> base spec ("Table 16.2: Non-ACT Platforms Checks on Memory Reads in Ci
> Mode" and "Table 16.3: Non-ACT Platforms Checks on Memory Reads in Li
> Mode"), the read/write to TDX private memory using shared KeyID without
> integrity check enabled will not poison the memory and cause machine
> check.
> 
> Note on the platforms with ACT (Access Control Table), there's no
> integrity check involved thus no machine check is possible to happen due
> to memory read/write using different KeyIDs.
> 
> KeyID 0 (TME key) doesn't support integrity check.  This series chooses
> to NOT reset TDX private memory but leave TDX private memory as-is to the
> new kernel.  As mentioned above, in practice it is safe to do so.
> 
> 3) One limitation
> 
> If the kernel has ever enabled TDX, after kexec the new kernel won't be
> able to use TDX anymore.  This is because when the new kernel tries to
> initialize TDX module it will fail on the first SEAMCALL due to the
> module has already been initialized by the old kernel.
> 
> More (non-trivial) work will be needed for the new kernel to use TDX,
> e.g., one solution is to just reload the TDX module from the location
> where BIOS loads the TDX module (/boot/efi/EFI/TDX/).  This series
> doesn't cover this, but leave this as future work.
> 
> 4) Kdump support
> 
> This series also enables kdump with TDX, but no special handling is
> needed for crash kexec (except turning on the Kconfig option):
> 
>  - kdump kernel uses reserved memory from the old kernel as system ram,
>    and the old kernel will never use the reserved memory as TDX memory.
>  - /proc/vmcore contains TDX private memory pages.  It's meaningless to
>    read them, but it doesn't do any harm either.
> 
> 5) TDX "partial write machine check" erratum
> 
> On the platform with TDX erratum, a partial write (a write transaction
> of less than a cacheline lands at memory controller) to TDX private
> memory poisons that memory, and a subsequent read triggers machine
> check.  On those platforms, the kernel needs to reset TDX private memory
> before jumping to the new kernel otherwise the new kernel may see
> unexpected machine check.
> 
> The kernel currently doesn't track which page is TDX private memory.
> It's not trivial to reset TDX private memory.  For simplicity, this
> series simply disables kexec/kdump for such platforms.  This can be
> enhanced in the future.
> 
> 
> 
> Kai Huang (7):
>   x86/kexec: Consolidate relocate_kernel() function parameters
>   x86/sme: Use percpu boolean to control WBINVD during kexec
>   x86/virt/tdx: Mark memory cache state incoherent when making SEAMCALL
>   x86/kexec: Disable kexec/kdump on platforms with TDX partial write
>     erratum
>   x86/virt/tdx: Remove the !KEXEC_CORE dependency
>   x86/virt/tdx: Update the kexec section in the TDX documentation
>   KVM: TDX: Explicitly do WBINVD when no more TDX SEAMCALLs
> 
>  Documentation/arch/x86/tdx.rst       | 14 ++++-----
>  arch/x86/Kconfig                     |  1 -
>  arch/x86/include/asm/kexec.h         | 12 ++++++--
>  arch/x86/include/asm/processor.h     |  2 ++
>  arch/x86/include/asm/tdx.h           | 31 +++++++++++++++++++-
>  arch/x86/kernel/cpu/amd.c            | 17 +++++++++++
>  arch/x86/kernel/machine_kexec_64.c   | 43 ++++++++++++++++++++++------
>  arch/x86/kernel/process.c            | 24 +++++++---------
>  arch/x86/kernel/relocate_kernel_64.S | 30 +++++++++++--------
>  arch/x86/kvm/vmx/tdx.c               | 12 ++++++++
>  arch/x86/virt/vmx/tdx/tdx.c          | 16 +++++++++--
>  11 files changed, 155 insertions(+), 47 deletions(-)
> 
> 
> base-commit: e180b3a224cb519388c2f61ca7bc1eaf94cec1fb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ