linux-kernel - Re: [PATCH v3 00/24] KVM: TDX huge page support for private memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAGtprH8GuKYctwJFJHcztx=q9hfwQF+1_7e8=h9r3u1V_GgzmQ@mail.gmail.com>
Date: Tue, 13 Jan 2026 08:40:11 -0800
From: Vishal Annapurve <vannapurve@...gle.com>
To: Yan Zhao <yan.y.zhao@...el.com>
Cc: Ackerley Tng <ackerleytng@...gle.com>, Sean Christopherson <seanjc@...gle.com>, pbonzini@...hat.com, 
	linux-kernel@...r.kernel.org, kvm@...r.kernel.org, x86@...nel.org, 
	rick.p.edgecombe@...el.com, dave.hansen@...el.com, kas@...nel.org, 
	tabba@...gle.com, michael.roth@....com, david@...nel.org, sagis@...gle.com, 
	vbabka@...e.cz, thomas.lendacky@....com, nik.borisov@...e.com, 
	pgonda@...gle.com, fan.du@...el.com, jun.miao@...el.com, 
	francescolavra.fl@...il.com, jgross@...e.com, ira.weiny@...el.com, 
	isaku.yamahata@...el.com, xiaoyao.li@...el.com, kai.huang@...el.com, 
	binbin.wu@...ux.intel.com, chao.p.peng@...el.com, chao.gao@...el.com
Subject: Re: [PATCH v3 00/24] KVM: TDX huge page support for private memory

On Mon, Jan 12, 2026 at 10:13 PM Yan Zhao <yan.y.zhao@...el.com> wrote:
>
> > >> > >>
> > >> > >> Additionally, we don't split private mappings in kvm_gmem_error_folio().
> > >> > >> If smaller folios are allowed, splitting private mapping is required there.
> > >> >
> > >> > It was discussed before that for memory failure handling, we will want
> > >> > to split huge pages, we will get to it! The trouble is that guest_memfd
> > >> > took the page from HugeTLB (unlike buddy or HugeTLB which manages memory
> > >> > from the ground up), so we'll still need to figure out it's okay to let
> > >> > HugeTLB deal with it when freeing, and when I last looked, HugeTLB
> > >> > doesn't actually deal with poisoned folios on freeing, so there's more
> > >> > work to do on the HugeTLB side.
> > >> >
> > >> > This is a good point, although IIUC it is a separate issue. The need to
> > >> > split private mappings on memory failure is not for confidentiality in
> > >> > the TDX sense but to ensure that the guest doesn't use the failed
> > >> > memory. In that case, contiguity is broken by the failed memory. The
> > >> > folio is split, the private EPTs are split. The folio size should still
> > >> > not be checked in TDX code. guest_memfd knows contiguity got broken, so
> > >> > guest_memfd calls TDX code to split the EPTs.
> > >>
> > >> Hmm, maybe the key is that we need to split S-EPT first before allowing
> > >> guest_memfd to split the backend folio. If splitting S-EPT fails, don't do the
> > >> folio splitting.
> > >>
> > >> This is better than performing folio splitting while it's mapped as huge in
> > >> S-EPT, since in the latter case, kvm_gmem_error_folio() needs to try to split
> > >> S-EPT. If the S-EPT splitting fails, falling back to zapping the huge mapping in
> > >> kvm_gmem_error_folio() would still trigger the over-zapping issue.
> > >>
> >
> > Let's put memory failure handling aside for now since for now it zaps
> > the entire huge page, so there's no impact on ordering between S-EPT and
> > folio split.
> Relying on guest_memfd's specific implemenation is not a good thing. e.g.,
>
> Given there's a version of guest_memfd allocating folios from buddy.
> 1. KVM maps a 2MB folio in a 2MB mappings.
> 2. guest_memfd splits the 2MB folio into 4KB folios, but fails and leaves the
>    2MB folio partially split.
> 3. Memory failure occurs on one of the split folio.
> 4. When splitting S-EPT fails, the over-zapping issue is still there.
>

Why is overzapping an issue?

Memory failure is supposed to be a rare occurrence and if there is no
memory to handle the splitting, I don't see any other choice than
overzapping. IIUC splitting the huge page range (in 1G -> 4K scenario)
requires even more memory than just splitting cross-boundary leaves
and has a higher chance of failing.

i.e. Whether the folio is split first or the SEPTs, there is always a
chance of failure leading to over-zapping. I don't see value in
optimizing rare failures within rarer memory failure handling
codepaths which are supposed to make best-effort decisions anyway.