linux-kernel - Re: [RFC PATCH 08/21] KVM: TDX: Increase/decrease folio ref for huge pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aCVZIuBHx51o7Pbl@yzhao56-desk.sh.intel.com>
Date: Thu, 15 May 2025 11:01:54 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Vishal Annapurve <vannapurve@...gle.com>
CC: <pbonzini@...hat.com>, <seanjc@...gle.com>,
	<linux-kernel@...r.kernel.org>, <kvm@...r.kernel.org>, <x86@...nel.org>,
	<rick.p.edgecombe@...el.com>, <dave.hansen@...el.com>,
	<kirill.shutemov@...el.com>, <tabba@...gle.com>, <ackerleytng@...gle.com>,
	<quic_eberman@...cinc.com>, <michael.roth@....com>, <david@...hat.com>,
	<vbabka@...e.cz>, <jroedel@...e.de>, <thomas.lendacky@....com>,
	<pgonda@...gle.com>, <zhiquan1.li@...el.com>, <fan.du@...el.com>,
	<jun.miao@...el.com>, <ira.weiny@...el.com>, <isaku.yamahata@...el.com>,
	<xiaoyao.li@...el.com>, <binbin.wu@...ux.intel.com>, <chao.p.peng@...el.com>
Subject: Re: [RFC PATCH 08/21] KVM: TDX: Increase/decrease folio ref for huge
 pages

On Mon, May 12, 2025 at 09:53:43AM -0700, Vishal Annapurve wrote:
> On Sun, May 11, 2025 at 7:18 PM Yan Zhao <yan.y.zhao@...el.com> wrote:
> > ...
> > >
> > > I might be wrongly throwing out some terminologies here then.
> > > VM_PFNMAP flag can be set for memory backed by folios/page structs.
> > > udmabuf seems to be working with pinned "folios" in the backend.
> > >
> > > The goal is to get to a stage where guest_memfd is backed by pfn
> > > ranges unmanaged by kernel that guest_memfd owns and distributes to
> > > userspace, KVM, IOMMU subject to shareability attributes. if the
> > OK. So from point of the reset part of kernel, those pfns are not regarded as
> > memory.
> >
> > > shareability changes, the users will get notified and will have to
> > > invalidate their mappings. guest_memfd will allow mmaping such ranges
> > > with VM_PFNMAP flag set by default in the VMAs to indicate the need of
> > > special handling/lack of page structs.
> > My concern is a failable invalidation notifer may not be ideal.
> > Instead of relying on ref counts (or other mechanisms) to determine whether to
> > start shareabilitiy changes, with a failable invalidation notifier, some users
> > may fail the invalidation and the shareability change, even after other users
> > have successfully unmapped a range.
> 
> Even if one user fails to invalidate its mappings, I don't see a
> reason to go ahead with shareability change. Shareability should not
> change unless all existing users let go of their soon-to-be-invalid
> view of memory.
My thinking is that:

1. guest_memfd starts shared-to-private conversion
2. guest_memfd sends invalidation notifications
   2.1 invalidate notification --> A --> Unmap and return success
   2.2 invalidate notification --> B --> Unmap and return success
   2.3 invalidate notification --> C --> return failure
3. guest_memfd finds 2.3 fails, fails shared-to-private conversion and keeps
   shareability as shared

Though the GFN remains shared after 3, it's unmapped in user A and B in 2.1 and
2.2. Even if additional notifications could be sent to A and B to ask for
mapping the GFN back, the map operation might fail. Consequently, A and B might
not be able to restore the mapped status of the GFN. For IOMMU mappings, this
could result in DMAR failure following a failed attempt to do shared-to-private
conversion.

I noticed Ackerley has posted the series. Will check there later.

> >
> > Auditing whether multiple users of shared memory correctly perform unmapping is
> > harder than auditing reference counts.
> >
> > > private memory backed by page structs and use a special "filemap" to
> > > map file offsets to these private memory ranges. This step will also
> > > need similar contract with users -
> > >    1) memory is pinned by guest_memfd
> > >    2) users will get invalidation notifiers on shareability changes
> > >
> > > I am sure there is a lot of work here and many quirks to be addressed,
> > > let's discuss this more with better context around. A few related RFC
> > > series are planned to be posted in the near future.
> > Ok. Thanks for your time and discussions :)
> > ...