lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aLDjpe31-w6md-GV@google.com>
Date: Thu, 28 Aug 2025 16:17:57 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Ira Weiny <ira.weiny@...el.com>
Cc: Rick P Edgecombe <rick.p.edgecombe@...el.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>, 
	"pbonzini@...hat.com" <pbonzini@...hat.com>, Vishal Annapurve <vannapurve@...gle.com>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Yan Y Zhao <yan.y.zhao@...el.com>, 
	"michael.roth@....com" <michael.roth@....com>
Subject: Re: [RFC PATCH 09/12] KVM: TDX: Fold tdx_mem_page_record_premap_cnt()
 into its sole caller

On Thu, Aug 28, 2025, Ira Weiny wrote:
> Edgecombe, Rick P wrote:
> > On Thu, 2025-08-28 at 13:26 -0700, Sean Christopherson wrote:
> > > Me confused.  This is pre-boot, not the normal fault path, i.e. blocking other
> > > operations is not a concern.
> > 
> > Just was my recollection of the discussion. I found it:
> > https://lore.kernel.org/lkml/Zbrj5WKVgMsUFDtb@google.com/
> > 
> > > 
> > > If tdh_mr_extend() is too heavy for a non-preemptible section, then the current
> > > code is also broken in the sense that there are no cond_resched() calls.  The
> > > vast majority of TDX hosts will be using non-preemptible kernels, so without an
> > > explicit cond_resched(), there's no practical difference between extending the
> > > measurement under mmu_lock versus outside of mmu_lock.
> > > 
> > > _If_ we need/want to do tdh_mr_extend() outside of mmu_lock, we can and should
> > > still do tdh_mem_page_add() under mmu_lock.
> > 
> > I just did a quick test and we should be on the order of <1 ms per page for the
> > full loop. I can try to get some more formal test data if it matters. But that
> > doesn't sound too horrible?
> > 
> > tdh_mr_extend() outside MMU lock is tempting because it doesn't *need* to be
> > inside it.
> 
> I'm probably not following this conversation, so stupid question:  It
> doesn't need to be in the lock because user space should not be setting up
> memory and extending the measurement in an asynchronous way.  Is that
> correct?

No, from userspace's perspective ADD+MEASURE is fully serialized.  ADD "needs"
to be under mmu_lock to guarantee consistency between the mirror EPT and the
"real" S-EPT entries.  E.g. if ADD is done after the fact, then KVM can end up
with a PRESENT M-EPT entry but a corresponding S-EPT entry that is !PRESENT.
That causes a pile of problems because it breaks KVM's fundamental assumption
that M-EPT and S-EPT entries updated in lock-step.

TDH_MR_EXTEND doesn't have the same same consistency issue.  If it fails, the
only thing that's left in a bad state is the measurement.  That's obviously not
ideal either, but we can handle that by forcefully terminating the VM, without
opening up KVM to edge cases that would otherwise be impossible.

> > But maybe a better reason is that we could better handle errors
> > outside the fault. (i.e. no 5 line comment about why not to return an error in
> > tdx_mem_page_add() due to code in another file).
> > 
> > I wonder if Yan can give an analysis of any zapping races if we do that.
> 
> When you say analysis, you mean detecting user space did something wrong
> and failing gracefully?  Is that correct?

More specifically, whether or not KVM can WARN without the WARN being user
triggerable.  Kernel policy is that WARNs must not be triggerable absent kernel,
hardware, or firmware bugs.  What we're trying to figure out is if there's a
flow that can be triggered by userspace (misbehving or not) that would trip a
WARN even if KVM is operating as expected.  I'm pretty sure the answer is "no".

Oh, and WARNing here is desirable, because it improves the chances of detecting
a fatal-to-the-VM bug, e.g. in KVM and/or in the TDX-Module.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ