[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z4Q94E6JvWrhvCyq@yilunxu-OptiPlex-7050>
Date: Mon, 13 Jan 2025 06:10:40 +0800
From: Xu Yilun <yilun.xu@...ux.intel.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: Christian König <christian.koenig@....com>,
Christoph Hellwig <hch@....de>, Leon Romanovsky <leonro@...dia.com>,
kvm@...r.kernel.org, dri-devel@...ts.freedesktop.org,
linux-media@...r.kernel.org, linaro-mm-sig@...ts.linaro.org,
sumit.semwal@...aro.org, pbonzini@...hat.com, seanjc@...gle.com,
alex.williamson@...hat.com, vivek.kasireddy@...el.com,
dan.j.williams@...el.com, aik@....com, yilun.xu@...el.com,
linux-coco@...ts.linux.dev, linux-kernel@...r.kernel.org,
lukas@...ner.de, yan.y.zhao@...el.com, leon@...nel.org,
baolu.lu@...ux.intel.com, zhenzhong.duan@...el.com,
tao1.su@...el.com
Subject: Re: [RFC PATCH 01/12] dma-buf: Introduce dma_buf_get_pfn_unlocked()
kAPI
On Fri, Jan 10, 2025 at 04:38:38PM -0400, Jason Gunthorpe wrote:
> On Fri, Jan 10, 2025 at 08:34:55PM +0100, Simona Vetter wrote:
>
> > So if I'm getting this right, what you need from a functional pov is a
> > dma_buf_tdx_mmap? Because due to tdx restrictions, the normal dma_buf_mmap
I'm not sure if the word 'mmap' is proper here.
It is kind of like the mapping from (FD+offset) to backend memory,
which is directly provided by memory provider, rather than via VMA
and cpu page table. Basically VMA & cpu page table are for host to
access the memory, but VMM/host doesn't access most of the guest
memory, so why must build them.
> > is not going to work I guess?
>
> Don't want something TDX specific!
>
> There is a general desire, and CC is one, but there are other
> motivations like performance, to stop using VMAs and mmaps as a way to
> exchanage memory between two entities. Instead we want to use FDs.
Exactly.
>
> We now have memfd and guestmemfd that are usable with
> memfd_pin_folios() - this covers pinnable CPU memory.
>
> And for a long time we had DMABUF which is for all the other wild
> stuff, and it supports movable memory too.
>
> So, the normal DMABUF semantics with reservation locking and move
> notifiers seem workable to me here. They are broadly similar enough to
> the mmu notifier locking that they can serve the same job of updating
> page tables.
Yes. With this new sharing model, the lifecycle of the shared memory/pfn/Page
is directly controlled by dma-buf exporter, not by CPU mapping. So I also
think reservation lock & move_notify works well for lifecycle control,
no conflict (nothing to do) with follow_pfn() & mmu_notifier.
>
> > Also another thing that's a bit tricky is that kvm kinda has a 3rd dma-buf
> > memory model:
> > - permanently pinned dma-buf, they never move
> > - dynamic dma-buf, they move through ->move_notify and importers can remap
> > - revocable dma-buf, which thus far only exist for pci mmio resources
>
> I would like to see the importers be able to discover which one is
> going to be used, because we have RDMA cases where we can support 1
> and 3 but not 2.
>
> revocable doesn't require page faulting as it is a terminal condition.
>
> > Since we're leaning even more on that 3rd model I'm wondering whether we
> > should make it something official. Because the existing dynamic importers
> > do very much assume that re-acquiring the memory after move_notify will
> > work. But for the revocable use-case the entire point is that it will
> > never work.
>
> > I feel like that's a concept we need to make explicit, so that dynamic
> > importers can reject such memory if necessary.
>
> It strikes me as strange that HW can do page faulting, so it can
> support #2, but it can't handle a non-present fault?
>
> > So yeah there's a bunch of tricky lifetime questions that need to be
> > sorted out with proper design I think, and the current "let's just use pfn
> > directly" proposal hides them all under the rug.
>
> I don't think these two things are connected. The lifetime model that
> KVM needs to work with the EPT, and that VFIO needs for it's MMIO,
> definately should be reviewed and evaluated.
>
> But it is completely orthogonal to allowing iommufd and kvm to access
> the CPU PFN to use in their mapping flows, instead of the
> dma_addr_t.
>
> What I want to get to is a replacement for scatter list in DMABUF that
> is an array of arrays, roughly like:
>
> struct memory_chunks {
> struct memory_p2p_provider *provider;
> struct bio_vec addrs[];
> };
> int (*dmabuf_get_memory)(struct memory_chunks **chunks, size_t *num_chunks);
Maybe we need to specify which object the API is operating on,
struct dma_buf, or struct dma_buf_attachment, or a new attachment.
I think:
int (*dmabuf_get_memory)(struct dma_buf_attachment *attach, struct memory_chunks **chunks, size_t *num_chunks);
works, but maybe a new attachment is conceptually more clear to
importers and harder to abuse?
Thanks,
Yilun
>
> This can represent all forms of memory: P2P, private, CPU, etc and
> would be efficient with the new DMA API.
>
> This is similar to the structure BIO has, and it composes nicely with
> a future pin_user_pages() and memfd_pin_folios().
>
> Jason
Powered by blists - more mailing lists