[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <68b0d2fb207cc_27c6d294e1@iweiny-mobl.notmuch>
Date: Thu, 28 Aug 2025 17:06:51 -0500
From: Ira Weiny <ira.weiny@...el.com>
To: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>, "seanjc@...gle.com"
<seanjc@...gle.com>
CC: "kvm@...r.kernel.org" <kvm@...r.kernel.org>, "pbonzini@...hat.com"
<pbonzini@...hat.com>, "Annapurve, Vishal" <vannapurve@...gle.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "Zhao, Yan Y"
<yan.y.zhao@...el.com>, "michael.roth@....com" <michael.roth@....com>,
"Weiny, Ira" <ira.weiny@...el.com>
Subject: Re: [RFC PATCH 09/12] KVM: TDX: Fold
tdx_mem_page_record_premap_cnt() into its sole caller
Edgecombe, Rick P wrote:
> On Thu, 2025-08-28 at 13:26 -0700, Sean Christopherson wrote:
> > Me confused. This is pre-boot, not the normal fault path, i.e. blocking other
> > operations is not a concern.
>
> Just was my recollection of the discussion. I found it:
> https://lore.kernel.org/lkml/Zbrj5WKVgMsUFDtb@google.com/
>
> >
> > If tdh_mr_extend() is too heavy for a non-preemptible section, then the current
> > code is also broken in the sense that there are no cond_resched() calls. The
> > vast majority of TDX hosts will be using non-preemptible kernels, so without an
> > explicit cond_resched(), there's no practical difference between extending the
> > measurement under mmu_lock versus outside of mmu_lock.
> >
> > _If_ we need/want to do tdh_mr_extend() outside of mmu_lock, we can and should
> > still do tdh_mem_page_add() under mmu_lock.
>
> I just did a quick test and we should be on the order of <1 ms per page for the
> full loop. I can try to get some more formal test data if it matters. But that
> doesn't sound too horrible?
>
> tdh_mr_extend() outside MMU lock is tempting because it doesn't *need* to be
> inside it.
I'm probably not following this conversation, so stupid question: It
doesn't need to be in the lock because user space should not be setting up
memory and extending the measurement in an asynchronous way. Is that
correct?
> But maybe a better reason is that we could better handle errors
> outside the fault. (i.e. no 5 line comment about why not to return an error in
> tdx_mem_page_add() due to code in another file).
>
> I wonder if Yan can give an analysis of any zapping races if we do that.
When you say analysis, you mean detecting user space did something wrong
and failing gracefully? Is that correct?
Ira
Powered by blists - more mailing lists