[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aWditFmWX3kOvPiB@yzhao56-desk.sh.intel.com>
Date: Wed, 14 Jan 2026 17:32:36 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Vishal Annapurve <vannapurve@...gle.com>
CC: Ackerley Tng <ackerleytng@...gle.com>, Sean Christopherson
<seanjc@...gle.com>, <pbonzini@...hat.com>, <linux-kernel@...r.kernel.org>,
<kvm@...r.kernel.org>, <x86@...nel.org>, <rick.p.edgecombe@...el.com>,
<dave.hansen@...el.com>, <kas@...nel.org>, <tabba@...gle.com>,
<michael.roth@....com>, <david@...nel.org>, <sagis@...gle.com>,
<vbabka@...e.cz>, <thomas.lendacky@....com>, <nik.borisov@...e.com>,
<pgonda@...gle.com>, <fan.du@...el.com>, <jun.miao@...el.com>,
<francescolavra.fl@...il.com>, <jgross@...e.com>, <ira.weiny@...el.com>,
<isaku.yamahata@...el.com>, <xiaoyao.li@...el.com>, <kai.huang@...el.com>,
<binbin.wu@...ux.intel.com>, <chao.p.peng@...el.com>, <chao.gao@...el.com>
Subject: Re: [PATCH v3 00/24] KVM: TDX huge page support for private memory
On Tue, Jan 13, 2026 at 08:40:11AM -0800, Vishal Annapurve wrote:
> On Mon, Jan 12, 2026 at 10:13 PM Yan Zhao <yan.y.zhao@...el.com> wrote:
> >
> > > >> > >>
> > > >> > >> Additionally, we don't split private mappings in kvm_gmem_error_folio().
> > > >> > >> If smaller folios are allowed, splitting private mapping is required there.
> > > >> >
> > > >> > It was discussed before that for memory failure handling, we will want
> > > >> > to split huge pages, we will get to it! The trouble is that guest_memfd
> > > >> > took the page from HugeTLB (unlike buddy or HugeTLB which manages memory
> > > >> > from the ground up), so we'll still need to figure out it's okay to let
> > > >> > HugeTLB deal with it when freeing, and when I last looked, HugeTLB
> > > >> > doesn't actually deal with poisoned folios on freeing, so there's more
> > > >> > work to do on the HugeTLB side.
> > > >> >
> > > >> > This is a good point, although IIUC it is a separate issue. The need to
> > > >> > split private mappings on memory failure is not for confidentiality in
> > > >> > the TDX sense but to ensure that the guest doesn't use the failed
> > > >> > memory. In that case, contiguity is broken by the failed memory. The
> > > >> > folio is split, the private EPTs are split. The folio size should still
> > > >> > not be checked in TDX code. guest_memfd knows contiguity got broken, so
> > > >> > guest_memfd calls TDX code to split the EPTs.
> > > >>
> > > >> Hmm, maybe the key is that we need to split S-EPT first before allowing
> > > >> guest_memfd to split the backend folio. If splitting S-EPT fails, don't do the
> > > >> folio splitting.
> > > >>
> > > >> This is better than performing folio splitting while it's mapped as huge in
> > > >> S-EPT, since in the latter case, kvm_gmem_error_folio() needs to try to split
> > > >> S-EPT. If the S-EPT splitting fails, falling back to zapping the huge mapping in
> > > >> kvm_gmem_error_folio() would still trigger the over-zapping issue.
> > > >>
> > >
> > > Let's put memory failure handling aside for now since for now it zaps
> > > the entire huge page, so there's no impact on ordering between S-EPT and
> > > folio split.
> > Relying on guest_memfd's specific implemenation is not a good thing. e.g.,
> >
> > Given there's a version of guest_memfd allocating folios from buddy.
> > 1. KVM maps a 2MB folio in a 2MB mappings.
> > 2. guest_memfd splits the 2MB folio into 4KB folios, but fails and leaves the
> > 2MB folio partially split.
> > 3. Memory failure occurs on one of the split folio.
> > 4. When splitting S-EPT fails, the over-zapping issue is still there.
> >
>
> Why is overzapping an issue?
> Memory failure is supposed to be a rare occurrence and if there is no
> memory to handle the splitting, I don't see any other choice than
> overzapping. IIUC splitting the huge page range (in 1G -> 4K scenario)
> requires even more memory than just splitting cross-boundary leaves
> and has a higher chance of failing.
>
> i.e. Whether the folio is split first or the SEPTs, there is always a
> chance of failure leading to over-zapping. I don't see value in
Hmm. If the split occurs after memory failure, yes, splitting S-EPT first also
has chance of over-zapping. But if the split occurs during private-to-shared
conversion for the non-conversion range, when memory failure later occurs on the
split folio, over-zapping can be avoided.
> optimizing rare failures within rarer memory failure handling
> codepaths which are supposed to make best-effort decisions anyway.
I agree it's of low priority.
Just not sure if there're edge cases besides this one.
Powered by blists - more mailing lists