linux-kernel - Re: [RFC PATCH v5 08/45] KVM: x86/mmu: Propagate mirror SPTE removal to S-EPT in handle_changed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aYQtIK/Lq5T3ad6V@yzhao56-desk.sh.intel.com>
Date: Thu, 5 Feb 2026 13:39:44 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: Thomas Gleixner <tglx@...nel.org>, Ingo Molnar <mingo@...hat.com>,
	Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
	<x86@...nel.org>, Kiryl Shutsemau <kas@...nel.org>, Paolo Bonzini
	<pbonzini@...hat.com>, <linux-kernel@...r.kernel.org>,
	<linux-coco@...ts.linux.dev>, <kvm@...r.kernel.org>, Kai Huang
	<kai.huang@...el.com>, Rick Edgecombe <rick.p.edgecombe@...el.com>, "Vishal
 Annapurve" <vannapurve@...gle.com>, Ackerley Tng <ackerleytng@...gle.com>,
	Sagi Shahar <sagis@...gle.com>, Binbin Wu <binbin.wu@...ux.intel.com>,
	Xiaoyao Li <xiaoyao.li@...el.com>, Isaku Yamahata <isaku.yamahata@...el.com>
Subject: Re: [RFC PATCH v5 08/45] KVM: x86/mmu: Propagate mirror SPTE removal
 to S-EPT in handle_changed_spte()

On Wed, Feb 04, 2026 at 06:23:38PM -0800, Sean Christopherson wrote:
> On Wed, Feb 04, 2026, Yan Zhao wrote:
> > On Wed, Jan 28, 2026 at 05:14:40PM -0800, Sean Christopherson wrote:
> > > @@ -590,10 +566,21 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
> > >  	 * the paging structure.  Note the WARN on the PFN changing without the
> > >  	 * SPTE being converted to a hugepage (leaf) or being zapped.  Shadow
> > >  	 * pages are kernel allocations and should never be migrated.
> > > +	 *
> > > +	 * When removing leaf entries from a mirror, immediately propagate the
> > > +	 * changes to the external page tables.  Note, non-leaf mirror entries
> > > +	 * are handled by handle_removed_pt(), as TDX requires that all leaf
> > > +	 * entries are removed before the owning page table.  Note #2, writes
> > > +	 * to make mirror PTEs shadow-present are propagated to external page
> > > +	 * tables by __tdp_mmu_set_spte_atomic(), as KVM needs to ensure the
> > > +	 * external page table was successfully updated before marking the
> > > +	 * mirror SPTE present.
> > >  	 */
> > >  	if (was_present && !was_leaf &&
> > >  	    (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed)))
> > >  		handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared);
> > > +	else if (was_leaf && is_mirror_sptep(sptep) && !is_leaf)
> > Should we check !is_present instead of !is_leaf?
> > e.g. a transition from a present leaf entry to a present non-leaf entry could
> > also trigger this if case.
> 
> No, the !is_leaf check is very intentional.  At this point in the series, S-EPT
> doesn't support hugepages.  If KVM manages to install a leaf SPTE and replaces
> that SPTE with a non-leaf SPTE, then we absolutely want the KVM_BUG_ON() in
> tdx_sept_remove_private_spte() to fire:
> 
> 	/* TODO: handle large pages. */
> 	if (KVM_BUG_ON(level != PG_LEVEL_4K, kvm))
> 		return -EIO;
But the op is named remove_external_spte().
And the check of "level != PG_LEVEL_4K" is for removing large leaf entries.
Relying on this check is tricky and confusing.

> And then later on, when S-EPT gains support for hugepages, "KVM: TDX: Add core
> support for splitting/demoting 2MiB S-EPT to 4KiB" doesn't need to touch code
> outside of arch/x86/kvm/vmx/tdx.c, because everything has already been plumbed
> in.
I haven't looked at the later patches for huge pages, but plumbing here directly
for splitting does not look right when it's invoked under shared mmu_lock.
See the comment below.
 
> > Besides, need "KVM_BUG_ON(shared, kvm)" in this case.
> 
> Eh, we have lockdep_assert_held_write() in the S-EPT paths that require mmu_lock
> to be held for write.  I don't think a KVM_BUG_ON() here would add meaningful
> value.
Hmm, I think KVM_BUG_ON(shared, kvm) is still useful.
If KVM invokes remove_external_spte() under shared mmu_lock, it needs to freeze
the entry first, similar to the sequence in __tdp_mmu_set_spte_atomic().

i.e., invoking external x86 ops in handle_changed_spte() for mirror roots should
be !shared only.

Relying on the TDX code's lockdep_assert_held_write() for warning seems less
clear than having an explicit check here.