linux-kernel - Re: [PATCH v3 00/24] KVM: TDX huge page support for private memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aV2eIalRLSEGozY0@google.com>
Date: Tue, 6 Jan 2026 15:43:29 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Ackerley Tng <ackerleytng@...gle.com>
Cc: Vishal Annapurve <vannapurve@...gle.com>, Yan Zhao <yan.y.zhao@...el.com>, pbonzini@...hat.com, 
	linux-kernel@...r.kernel.org, kvm@...r.kernel.org, x86@...nel.org, 
	rick.p.edgecombe@...el.com, dave.hansen@...el.com, kas@...nel.org, 
	tabba@...gle.com, michael.roth@....com, david@...nel.org, sagis@...gle.com, 
	vbabka@...e.cz, thomas.lendacky@....com, nik.borisov@...e.com, 
	pgonda@...gle.com, fan.du@...el.com, jun.miao@...el.com, 
	francescolavra.fl@...il.com, jgross@...e.com, ira.weiny@...el.com, 
	isaku.yamahata@...el.com, xiaoyao.li@...el.com, kai.huang@...el.com, 
	binbin.wu@...ux.intel.com, chao.p.peng@...el.com, chao.gao@...el.com
Subject: Re: [PATCH v3 00/24] KVM: TDX huge page support for private memory

On Tue, Jan 06, 2026, Ackerley Tng wrote:
> Sean Christopherson <seanjc@...gle.com> writes:
> 
> > On Tue, Jan 06, 2026, Ackerley Tng wrote:
> >> Vishal Annapurve <vannapurve@...gle.com> writes:
> >>
> >> > On Tue, Jan 6, 2026 at 2:19 AM Yan Zhao <yan.y.zhao@...el.com> wrote:
> >> >>
> >> >> - EPT mapping size and folio size
> >> >>
> >> >>   This series is built upon the rule in KVM that the mapping size in the
> >> >>   KVM-managed secondary MMU is no larger than the backend folio size.
> >> >>
> >>
> >> I'm not familiar with this rule and would like to find out more. Why is
> >> this rule imposed?
> >
> > Because it's the only sane way to safely map memory into the guest? :-D
> >
> >> Is this rule there just because traditionally folio sizes also define the
> >> limit of contiguity, and so the mapping size must not be greater than folio
> >> size in case the block of memory represented by the folio is not contiguous?
> >
> > Pre-guest_memfd, KVM didn't care about folios.  KVM's mapping size was (and still
> > is) strictly bound by the host mapping size.  That's handles contiguous addresses,
> > but it _also_ handles contiguous protections (e.g. RWX) and other attributes.
> >
> >> In guest_memfd's case, even if the folio is split (just for refcount
> >> tracking purposese on private to shared conversion), the memory is still
> >> contiguous up to the original folio's size. Will the contiguity address
> >> the concerns?
> >
> > Not really?  Why would the folio be split if the memory _and its attributes_ are
> > fully contiguous?  If the attributes are mixed, KVM must not create a mapping
> > spanning mixed ranges, i.e. with multiple folios.
> 
> The folio can be split if any (or all) of the pages in a huge page range
> are shared (in the CoCo sense). So in a 1G block of memory, even if the
> attributes all read 0 (!KVM_MEMORY_ATTRIBUTE_PRIVATE), the folio
> would be split, and the split folios are necessary for tracking users of
> shared pages using struct page refcounts.

Ahh, that's what the refcounting was referring to.  Gotcha.

> However the split folios in that 1G range are still fully contiguous.
> 
> The process of conversion will split the EPT entries soon after the
> folios are split so the rule remains upheld.
> 
> I guess perhaps the question is, is it okay if the folios are smaller
> than the mapping while conversion is in progress? Does the order matter
> (split page table entries first vs split folios first)?

Mapping a hugepage for memory that KVM _knows_ is contiguous and homogenous is
conceptually totally fine, i.e. I'm not totally opposed to adding support for
mapping multiple guest_memfd folios with a single hugepage.   As to whether we
do (a) nothing, (b) change the refcounting, or (c) add support for mapping
multiple folios in one page, probably comes down to which option provides "good
enough" performance without incurring too much complexity.