linux-kernel - Re: [RFC PATCH 08/21] KVM: TDX: Increase/decrease folio ref for huge pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aBmmirBzOZfmMOJj@yzhao56-desk.sh.intel.com>
Date: Tue, 6 May 2025 14:04:58 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Vishal Annapurve <vannapurve@...gle.com>
CC: <pbonzini@...hat.com>, <seanjc@...gle.com>,
	<linux-kernel@...r.kernel.org>, <kvm@...r.kernel.org>, <x86@...nel.org>,
	<rick.p.edgecombe@...el.com>, <dave.hansen@...el.com>,
	<kirill.shutemov@...el.com>, <tabba@...gle.com>, <ackerleytng@...gle.com>,
	<quic_eberman@...cinc.com>, <michael.roth@....com>, <david@...hat.com>,
	<vbabka@...e.cz>, <jroedel@...e.de>, <thomas.lendacky@....com>,
	<pgonda@...gle.com>, <zhiquan1.li@...el.com>, <fan.du@...el.com>,
	<jun.miao@...el.com>, <ira.weiny@...el.com>, <isaku.yamahata@...el.com>,
	<xiaoyao.li@...el.com>, <binbin.wu@...ux.intel.com>, <chao.p.peng@...el.com>
Subject: Re: [RFC PATCH 08/21] KVM: TDX: Increase/decrease folio ref for huge
 pages

On Mon, May 05, 2025 at 10:08:24PM -0700, Vishal Annapurve wrote:
> On Mon, May 5, 2025 at 5:56 PM Yan Zhao <yan.y.zhao@...el.com> wrote:
> >
> > Sorry for the late reply, I was on leave last week.
> >
> > On Tue, Apr 29, 2025 at 06:46:59AM -0700, Vishal Annapurve wrote:
> > > On Mon, Apr 28, 2025 at 5:52 PM Yan Zhao <yan.y.zhao@...el.com> wrote:
> > > > So, we plan to remove folio_ref_add()/folio_put_refs() in future, only invoking
> > > > folio_ref_add() in the event of a removal failure.
> > >
> > > In my opinion, the above scheme can be deployed with this series
> > > itself. guest_memfd will not take away memory from TDX VMs without an
> > I initially intended to add a separate patch at the end of this series to
> > implement invoking folio_ref_add() only upon a removal failure. However, I
> > decided against it since it's not a must before guest_memfd supports in-place
> > conversion.
> >
> > We can include it in the next version If you think it's better.
> 
> Ackerley is planning to send out a series for 1G Hugetlb support with
> guest memfd soon, hopefully this week. Plus I don't see any reason to
> hold extra refcounts in TDX stack so it would be good to clean up this
> logic.
> 
> >
> > > invalidation. folio_ref_add() will not work for memory not backed by
> > > page structs, but that problem can be solved in future possibly by
> > With current TDX code, all memory must be backed by a page struct.
> > Both tdh_mem_page_add() and tdh_mem_page_aug() require a "struct page *" rather
> > than a pfn.
> >
> > > notifying guest_memfd of certain ranges being in use even after
> > > invalidation completes.
> > A curious question:
> > To support memory not backed by page structs in future, is there any counterpart
> > to the page struct to hold ref count and map count?
> >
> 
> I imagine the needed support will match similar semantics as VM_PFNMAP
> [1] memory. No need to maintain refcounts/map counts for such physical
> memory ranges as all users will be notified when mappings are
> changed/removed.
So, it's possible to map such memory in both shared and private EPT
simultaneously?


> Any guest_memfd range updates will result in invalidations/updates of
> userspace, guest, IOMMU or any other page tables referring to
> guest_memfd backed pfns. This story will become clearer once the
> support for PFN range allocator for backing guest_memfd starts getting
> discussed.
Ok. It is indeed unclear right now to support such kind of memory.

Up to now, we don't anticipate TDX will allow any mapping of VM_PFNMAP memory
into private EPT until TDX connect.
And even in that scenario, the memory is only for private MMIO, so the backend
driver is VFIO pci driver rather than guest_memfd.


> [1] https://elixir.bootlin.com/linux/v6.14.5/source/mm/memory.c#L6543