linux-kernel - Re: [RFC PATCH 01/12] dma-buf: Introduce dma_buf_get_pfn

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z4Q94E6JvWrhvCyq@yilunxu-OptiPlex-7050>
Date: Mon, 13 Jan 2025 06:10:40 +0800
From: Xu Yilun <yilun.xu@...ux.intel.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: Christian König <christian.koenig@....com>,
	Christoph Hellwig <hch@....de>, Leon Romanovsky <leonro@...dia.com>,
	kvm@...r.kernel.org, dri-devel@...ts.freedesktop.org,
	linux-media@...r.kernel.org, linaro-mm-sig@...ts.linaro.org,
	sumit.semwal@...aro.org, pbonzini@...hat.com, seanjc@...gle.com,
	alex.williamson@...hat.com, vivek.kasireddy@...el.com,
	dan.j.williams@...el.com, aik@....com, yilun.xu@...el.com,
	linux-coco@...ts.linux.dev, linux-kernel@...r.kernel.org,
	lukas@...ner.de, yan.y.zhao@...el.com, leon@...nel.org,
	baolu.lu@...ux.intel.com, zhenzhong.duan@...el.com,
	tao1.su@...el.com
Subject: Re: [RFC PATCH 01/12] dma-buf: Introduce dma_buf_get_pfn_unlocked()
 kAPI

On Fri, Jan 10, 2025 at 04:38:38PM -0400, Jason Gunthorpe wrote:
> On Fri, Jan 10, 2025 at 08:34:55PM +0100, Simona Vetter wrote:
> 
> > So if I'm getting this right, what you need from a functional pov is a
> > dma_buf_tdx_mmap? Because due to tdx restrictions, the normal dma_buf_mmap

I'm not sure if the word 'mmap' is proper here.

It is kind of like the mapping from (FD+offset) to backend memory,
which is directly provided by memory provider, rather than via VMA
and cpu page table.  Basically VMA & cpu page table are for host to
access the memory, but VMM/host doesn't access most of the guest
memory, so why must build them.

> > is not going to work I guess?
> 
> Don't want something TDX specific!
> 
> There is a general desire, and CC is one, but there are other
> motivations like performance, to stop using VMAs and mmaps as a way to
> exchanage memory between two entities. Instead we want to use FDs.

Exactly.

> 
> We now have memfd and guestmemfd that are usable with
> memfd_pin_folios() - this covers pinnable CPU memory.
> 
> And for a long time we had DMABUF which is for all the other wild
> stuff, and it supports movable memory too.
> 
> So, the normal DMABUF semantics with reservation locking and move
> notifiers seem workable to me here. They are broadly similar enough to
> the mmu notifier locking that they can serve the same job of updating
> page tables.

Yes. With this new sharing model, the lifecycle of the shared memory/pfn/Page
is directly controlled by dma-buf exporter, not by CPU mapping. So I also
think reservation lock & move_notify works well for lifecycle control,
no conflict (nothing to do) with follow_pfn() & mmu_notifier.

> 
> > Also another thing that's a bit tricky is that kvm kinda has a 3rd dma-buf
> > memory model:
> > - permanently pinned dma-buf, they never move
> > - dynamic dma-buf, they move through ->move_notify and importers can remap
> > - revocable dma-buf, which thus far only exist for pci mmio resources
> 
> I would like to see the importers be able to discover which one is
> going to be used, because we have RDMA cases where we can support 1
> and 3 but not 2.
> 
> revocable doesn't require page faulting as it is a terminal condition.
> 
> > Since we're leaning even more on that 3rd model I'm wondering whether we
> > should make it something official. Because the existing dynamic importers
> > do very much assume that re-acquiring the memory after move_notify will
> > work. But for the revocable use-case the entire point is that it will
> > never work.
> 
> > I feel like that's a concept we need to make explicit, so that dynamic
> > importers can reject such memory if necessary.
> 
> It strikes me as strange that HW can do page faulting, so it can
> support #2, but it can't handle a non-present fault?
> 
> > So yeah there's a bunch of tricky lifetime questions that need to be
> > sorted out with proper design I think, and the current "let's just use pfn
> > directly" proposal hides them all under the rug. 
> 
> I don't think these two things are connected. The lifetime model that
> KVM needs to work with the EPT, and that VFIO needs for it's MMIO,
> definately should be reviewed and evaluated.
> 
> But it is completely orthogonal to allowing iommufd and kvm to access
> the CPU PFN to use in their mapping flows, instead of the
> dma_addr_t.
> 
> What I want to get to is a replacement for scatter list in DMABUF that
> is an array of arrays, roughly like:
> 
>   struct memory_chunks {
>       struct memory_p2p_provider *provider;
>       struct bio_vec addrs[];
>   };
>   int (*dmabuf_get_memory)(struct memory_chunks **chunks, size_t *num_chunks);

Maybe we need to specify which object the API is operating on,
struct dma_buf, or struct dma_buf_attachment, or a new attachment.

I think:

  int (*dmabuf_get_memory)(struct dma_buf_attachment *attach, struct memory_chunks **chunks, size_t *num_chunks);

works, but maybe a new attachment is conceptually more clear to
importers and harder to abuse?

Thanks,
Yilun

> 
> This can represent all forms of memory: P2P, private, CPU, etc and
> would be efficient with the new DMA API.
> 
> This is similar to the structure BIO has, and it composes nicely with
> a future pin_user_pages() and memfd_pin_folios().
> 
> Jason