[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aCVZIuBHx51o7Pbl@yzhao56-desk.sh.intel.com>
Date: Thu, 15 May 2025 11:01:54 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Vishal Annapurve <vannapurve@...gle.com>
CC: <pbonzini@...hat.com>, <seanjc@...gle.com>,
<linux-kernel@...r.kernel.org>, <kvm@...r.kernel.org>, <x86@...nel.org>,
<rick.p.edgecombe@...el.com>, <dave.hansen@...el.com>,
<kirill.shutemov@...el.com>, <tabba@...gle.com>, <ackerleytng@...gle.com>,
<quic_eberman@...cinc.com>, <michael.roth@....com>, <david@...hat.com>,
<vbabka@...e.cz>, <jroedel@...e.de>, <thomas.lendacky@....com>,
<pgonda@...gle.com>, <zhiquan1.li@...el.com>, <fan.du@...el.com>,
<jun.miao@...el.com>, <ira.weiny@...el.com>, <isaku.yamahata@...el.com>,
<xiaoyao.li@...el.com>, <binbin.wu@...ux.intel.com>, <chao.p.peng@...el.com>
Subject: Re: [RFC PATCH 08/21] KVM: TDX: Increase/decrease folio ref for huge
pages
On Mon, May 12, 2025 at 09:53:43AM -0700, Vishal Annapurve wrote:
> On Sun, May 11, 2025 at 7:18 PM Yan Zhao <yan.y.zhao@...el.com> wrote:
> > ...
> > >
> > > I might be wrongly throwing out some terminologies here then.
> > > VM_PFNMAP flag can be set for memory backed by folios/page structs.
> > > udmabuf seems to be working with pinned "folios" in the backend.
> > >
> > > The goal is to get to a stage where guest_memfd is backed by pfn
> > > ranges unmanaged by kernel that guest_memfd owns and distributes to
> > > userspace, KVM, IOMMU subject to shareability attributes. if the
> > OK. So from point of the reset part of kernel, those pfns are not regarded as
> > memory.
> >
> > > shareability changes, the users will get notified and will have to
> > > invalidate their mappings. guest_memfd will allow mmaping such ranges
> > > with VM_PFNMAP flag set by default in the VMAs to indicate the need of
> > > special handling/lack of page structs.
> > My concern is a failable invalidation notifer may not be ideal.
> > Instead of relying on ref counts (or other mechanisms) to determine whether to
> > start shareabilitiy changes, with a failable invalidation notifier, some users
> > may fail the invalidation and the shareability change, even after other users
> > have successfully unmapped a range.
>
> Even if one user fails to invalidate its mappings, I don't see a
> reason to go ahead with shareability change. Shareability should not
> change unless all existing users let go of their soon-to-be-invalid
> view of memory.
My thinking is that:
1. guest_memfd starts shared-to-private conversion
2. guest_memfd sends invalidation notifications
2.1 invalidate notification --> A --> Unmap and return success
2.2 invalidate notification --> B --> Unmap and return success
2.3 invalidate notification --> C --> return failure
3. guest_memfd finds 2.3 fails, fails shared-to-private conversion and keeps
shareability as shared
Though the GFN remains shared after 3, it's unmapped in user A and B in 2.1 and
2.2. Even if additional notifications could be sent to A and B to ask for
mapping the GFN back, the map operation might fail. Consequently, A and B might
not be able to restore the mapped status of the GFN. For IOMMU mappings, this
could result in DMAR failure following a failed attempt to do shared-to-private
conversion.
I noticed Ackerley has posted the series. Will check there later.
> >
> > Auditing whether multiple users of shared memory correctly perform unmapping is
> > harder than auditing reference counts.
> >
> > > private memory backed by page structs and use a special "filemap" to
> > > map file offsets to these private memory ranges. This step will also
> > > need similar contract with users -
> > > 1) memory is pinned by guest_memfd
> > > 2) users will get invalidation notifiers on shareability changes
> > >
> > > I am sure there is a lot of work here and many quirks to be addressed,
> > > let's discuss this more with better context around. A few related RFC
> > > series are planned to be posted in the near future.
> > Ok. Thanks for your time and discussions :)
> > ...
Powered by blists - more mailing lists