lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aK9Xqy0W1ghonWUL@google.com>
Date: Wed, 27 Aug 2025 12:08:27 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Yan Zhao <yan.y.zhao@...el.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Michael Roth <michael.roth@....com>, Ira Weiny <ira.weiny@...el.com>, 
	Vishal Annapurve <vannapurve@...gle.com>, Rick Edgecombe <rick.p.edgecombe@...el.com>
Subject: Re: [RFC PATCH 09/12] KVM: TDX: Fold tdx_mem_page_record_premap_cnt()
 into its sole caller

On Wed, Aug 27, 2025, Yan Zhao wrote:
> On Tue, Aug 26, 2025 at 05:05:19PM -0700, Sean Christopherson wrote:
> > @@ -1641,14 +1618,30 @@ static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn,
> >  		return -EIO;
> >  
> >  	/*
> > -	 * Read 'pre_fault_allowed' before 'kvm_tdx->state'; see matching
> > -	 * barrier in tdx_td_finalize().
> > +	 * Ensure pre_fault_allowed is read by kvm_arch_vcpu_pre_fault_memory()
> > +	 * before kvm_tdx->state.  Userspace must not be allowed to pre-fault
> > +	 * arbitrary memory until the initial memory image is finalized.  Pairs
> > +	 * with the smp_wmb() in tdx_td_finalize().
> >  	 */
> >  	smp_rmb();
> > -	if (likely(kvm_tdx->state == TD_STATE_RUNNABLE))
> > -		return tdx_mem_page_aug(kvm, gfn, level, pfn);
> >  
> > -	return tdx_mem_page_record_premap_cnt(kvm, gfn, level, pfn);
> > +	/*
> > +	 * If the TD isn't finalized/runnable, then userspace is initializing
> > +	 * the VM image via KVM_TDX_INIT_MEM_REGION.  Increment the number of
> > +	 * pages that need to be initialized via TDH.MEM.PAGE.ADD (PAGE.ADD
> > +	 * requires a pre-existing S-EPT mapping).  KVM_TDX_FINALIZE_VM checks
> > +	 * the counter to ensure all mapped pages have been added to the image,
> > +	 * to prevent running the TD with uninitialized memory.
> To prevent the mismatch between mirror EPT and the S-EPT?

No?  Because KVM bumps the count when installing the S-EPT and decrements it
on AUG, so I don't see how nr_premapped guards against M-EPT vs. S-EPT issues?

> e.g., Before KVM_TDX_FINALIZE_VM, if userspace performs a zap after the
> TDH.MEM.PAGE.ADD, the page will be removed from the S-EPT. The count of
> nr_premapped will not change after the successful TDH.MEM.RANGE.BLOCK and
> TDH.MEM.PAGE.REMOVE.

Eww.  It would be nice to close that hole, but I suppose it's futile, e.g. the
underlying problem is unexpectedly removing pages from the initial, whether the
VMM is doing stupid things before vs. after FINALIZE doesn't really matter.

> As a result, the TD will still run with uninitialized memory.

No?  Because BLOCK+REMOVE means there are no valid S-EPT mappings.  There's a
"hole" that the guest might not expect, but that hole will trigger an EPT
violation and only get "filled" if the guest explicitly accepts an AUG'd page.

Side topic, why does KVM tolerate tdh_mem_page_add() failure?  IIUC, playing
nice with tdh_mem_page_add() failure necessitates both the
tdx_is_sept_zap_err_due_to_premap() craziness and the check in tdx_td_finalize()
that all pending pages have been consumed.

What reasonable use case is there for gracefully handling tdh_mem_page_add() failure?

If there is a need to handle failure, I gotta imagine it's only for the -EBUSY
case.  And if it's only for -EBUSY, why can't that be handled by retrying in
tdx_vcpu_init_mem_region()?  If tdx_vcpu_init_mem_region() guarantees that all
pages mapped into the S-EPT are ADDed, then it can assert that there are no
pending pages when it completes (even if it "fails"), and similarly
tdx_td_finalize() can KVM_BUG_ON/WARN_ON the number of pending pages being
non-zero.

> > +	 */
> > +	if (unlikely(kvm_tdx->state != TD_STATE_RUNNABLE)) {
> > +		if (KVM_BUG_ON(kvm->arch.pre_fault_allowed, kvm))
> > +			return -EIO;
> > +
> > +		atomic64_inc(&kvm_tdx->nr_premapped);
> > +		return 0;
> > +	}
> > +
> > +	return tdx_mem_page_aug(kvm, gfn, level, pfn);
> >  }
> >  
> >  static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn,
> > -- 
> > 2.51.0.268.g9569e192d0-goog
> > 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ