linux-kernel - Re: [PATCH v3 00/24] KVM: TDX huge page support for private memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAGtprH88FAYhZdEP94wS9=swMX=CxpMKZs8+jxdrXsf0dv-tZQ@mail.gmail.com>
Date: Fri, 9 Jan 2026 09:16:48 -0800
From: Vishal Annapurve <vannapurve@...gle.com>
To: Yan Zhao <yan.y.zhao@...el.com>
Cc: Ackerley Tng <ackerleytng@...gle.com>, Sean Christopherson <seanjc@...gle.com>, pbonzini@...hat.com, 
	linux-kernel@...r.kernel.org, kvm@...r.kernel.org, x86@...nel.org, 
	rick.p.edgecombe@...el.com, dave.hansen@...el.com, kas@...nel.org, 
	tabba@...gle.com, michael.roth@....com, david@...nel.org, sagis@...gle.com, 
	vbabka@...e.cz, thomas.lendacky@....com, nik.borisov@...e.com, 
	pgonda@...gle.com, fan.du@...el.com, jun.miao@...el.com, 
	francescolavra.fl@...il.com, jgross@...e.com, ira.weiny@...el.com, 
	isaku.yamahata@...el.com, xiaoyao.li@...el.com, kai.huang@...el.com, 
	binbin.wu@...ux.intel.com, chao.p.peng@...el.com, chao.gao@...el.com
Subject: Re: [PATCH v3 00/24] KVM: TDX huge page support for private memory

On Fri, Jan 9, 2026 at 8:12 AM Vishal Annapurve <vannapurve@...gle.com> wrote:
>
> > > >
> > >
> > > I think the central question I have among all the above is what TDX
> > > needs to actually care about (putting aside what KVM's folio size/memory
> > > contiguity vs mapping level rule for a while).
> > >
> > > I think TDX code can check what it cares about (if required to aid
> > > debugging, as Dave suggested). Does TDX actually care about folio sizes,
> > > or does it actually care about memory contiguity and alignment?
> > TDX cares about memory contiguity. A single folio ensures memory contiguity.
>
> In this slightly unusual case, I think the guarantee needed here is
> that as long as a range is mapped into SEPT entries, guest_memfd
> ensures that the complete range stays private.
>
> i.e. I think it should be safe to rely on guest_memfd here,
> irrespective of the folio sizes:
> 1) KVM TDX stack should be able to reclaim the complete range when unmapping.
> 2) KVM TDX stack can assume that as long as memory is mapped in SEPT
> entries, guest_memfd will not let host userspace mappings to access
> guest private memory.
>
> >
> > Allowing one S-EPT mapping to cover multiple folios may also mean it's no longer
> > reasonable to pass "struct page" to tdh_phymem_page_wbinvd_hkid() for a
> > contiguous range larger than the page's folio range.
>
> What's the issue with passing the (struct page*, unsigned long nr_pages) pair?
>
> >
> > Additionally, we don't split private mappings in kvm_gmem_error_folio().
> > If smaller folios are allowed, splitting private mapping is required there.
>
> Yes, I believe splitting private mappings will be invoked to ensure
> that the whole huge folio is not unmapped from KVM due to an error on
> just a 4K page. Is that a problem?
>
> If splitting fails, the implementation can fall back to completely
> zapping the folio range.

I forgot to mention that this is a future improvement that will
introduce hugetlb memory failure handling and is not covered by
Ackerley's current set of patches.

>
> > (e.g., after splitting a 1GB folio to 4KB folios with 2MB mappings. Also, is it
> > possible for splitting a huge folio to fail partially, without merging the huge
> > folio back or further zapping?).
>
> Yes, splitting can fail partially, but guest_memfd will not make the
> ranges available to host userspace and derivatives until:
> 1) The complete range to be converted is split to 4K granularity.
> 2) The complete range to be converted is zapped from KVM EPT mappings.
>
> > Not sure if there're other edge cases we're still missing.