linux-kernel - Re: [RFC PATCH 01/12] dma-buf: Introduce dma_buf_get_pfn

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z4psR1qoNQUQf3Q2@phenom.ffwll.local>
Date: Fri, 17 Jan 2025 15:42:15 +0100
From: Simona Vetter <simona.vetter@...ll.ch>
To: Christian König <christian.koenig@....com>
Cc: Jason Gunthorpe <jgg@...dia.com>, Xu Yilun <yilun.xu@...ux.intel.com>,
	Christoph Hellwig <hch@....de>, Leon Romanovsky <leonro@...dia.com>,
	kvm@...r.kernel.org, dri-devel@...ts.freedesktop.org,
	linux-media@...r.kernel.org, linaro-mm-sig@...ts.linaro.org,
	sumit.semwal@...aro.org, pbonzini@...hat.com, seanjc@...gle.com,
	alex.williamson@...hat.com, vivek.kasireddy@...el.com,
	dan.j.williams@...el.com, aik@....com, yilun.xu@...el.com,
	linux-coco@...ts.linux.dev, linux-kernel@...r.kernel.org,
	lukas@...ner.de, yan.y.zhao@...el.com, leon@...nel.org,
	baolu.lu@...ux.intel.com, zhenzhong.duan@...el.com,
	tao1.su@...el.com
Subject: Re: [RFC PATCH 01/12] dma-buf: Introduce dma_buf_get_pfn_unlocked()
 kAPI

On Wed, Jan 15, 2025 at 11:06:53AM +0100, Christian König wrote:
> Am 15.01.25 um 09:55 schrieb Simona Vetter:
> > > > If we add something
> > > > new, we need clear rules and not just "here's the kvm code that uses it".
> > > > That's how we've done dma-buf at first, and it was a terrible mess of
> > > > mismatched expecations.
> > > Yes, that would be wrong. It should be self defined within dmabuf and
> > > kvm should adopt to it, move semantics and all.
> > Ack.
> > 
> > I feel like we have a plan here.
> 
> I think I have to object a bit on that.
> 
> >   Summary from my side:
> > 
> > - Sort out pin vs revocable vs dynamic/moveable semantics, make sure
> >    importers have no surprises.
> > 
> > - Adopt whatever new dma-api datastructures pops out of the dma-api
> >    reworks.
> > 
> > - Add pfn based memory access as yet another optional access method, with
> >    helpers so that exporters who support this get all the others for free.
> > 
> > I don't see a strict ordering between these, imo should be driven by
> > actual users of the dma-buf api.
> > 
> > Already done:
> > 
> > - dmem cgroup so that we can resource control device pinnings just landed
> >    in drm-next for next merge window. So that part is imo sorted and we can
> >    charge ahead with pinning into device memory without all the concerns
> >    we've had years ago when discussing that for p2p dma-buf support.
> > 
> >    But there might be some work so that we can p2p pin without requiring
> >    dynamic attachments only, I haven't checked whether that needs
> >    adjustment in dma-buf.c code or just in exporters.
> > 
> > Anything missing?
> 
> Well as far as I can see this use case is not a good fit for the DMA-buf
> interfaces in the first place. DMA-buf deals with devices and buffer
> exchange.
> 
> What's necessary here instead is to give an importing VM full access on some
> memory for their specific use case.
> 
> That full access includes CPU and DMA mappings, modifying caching
> attributes, potentially setting encryption keys for specific ranges etc....
> etc...
> 
> In other words we have a lot of things the importer here should be able to
> do which we don't want most of the DMA-buf importers to do.

This proposal isn't about forcing existing exporters to allow importers to
do new stuff. That stays as-is, because it would break things.

It's about adding yet another interface to get at the underlying data, and
we have tons of those already. The only difference is that if we don't
butcher the design, we'll be able to implement all the existing dma-buf
interfaces on top of this new pfn interface, for some neat maximal
compatibility.

But fundamentally there's never been an expectation that you can take any
arbitrary dma-buf and pass it any arbitrary importer, and that is must
work. The fundamental promise is that if it _does_ work, then
- it's zero copy
- and fast, or as fast as we can make it

I don't see this any different than all the much more specific prposals
and existing code, where a subset of importers/exporters have special
rules so that e.g. gpu interconnect or vfio uuid based sharing works.
pfn-based sharing is just yet another flavor that exists to get the max
amount of speed out of interconnects.

Cheers, Sima

> 
> The semantics for things like pin vs revocable vs dynamic/moveable seems
> similar, but that's basically it.
> 
> As far as I know the TEE subsystem also represents their allocations as file
> descriptors. If I'm not completely mistaken this use case most likely fit's
> better there.
> 
> > I feel like this is small enough that m-l archives is good enough. For
> > some of the bigger projects we do in graphics we sometimes create entries
> > in our kerneldoc with wip design consensus and things like that. But
> > feels like overkill here.
> > 
> > > My general desire is to move all of RDMA's MR process away from
> > > scatterlist and work using only the new DMA API. This will save *huge*
> > > amounts of memory in common workloads and be the basis for non-struct
> > > page DMA support, including P2P.
> > Yeah a more memory efficient structure than the scatterlist would be
> > really nice. That would even benefit the very special dma-buf exporters
> > where you cannot get a pfn and only the dma_addr_t, altough most of those
> > (all maybe even?) have contig buffers, so your scatterlist has only one
> > entry. But it would definitely be nice from a design pov.
> 
> Completely agree on that part.
> 
> Scatterlist have a some design flaws, especially mixing the input and out
> parameters of the DMA API into the same structure.
> 
> Additional to that DMA addresses are basically missing which bus they belong
> to and details how the access should be made (e.g. snoop vs no-snoop
> etc...).
> 
> > Aside: A way to more efficiently create compressed scatterlists would be
> > neat too, because a lot of drivers hand-roll that and it's a bit brittle
> > and kinda silly to duplicate. With compressed I mean just a single entry
> > for a contig range, in practice thanks to huge pages/folios and allocators
> > trying to hand out contig ranges if there's plenty of memory that saves a
> > lot of memory too. But currently it's a bit a pain to construct these
> > efficiently, mostly it's just a two-pass approach and then trying to free
> > surplus memory or krealloc to fit. Also I don't have good ideas here, but
> > dma-api folks might have some from looking at too many things that create
> > scatterlists.
> 
> I mailed with Christoph about that a while back as well and we both agreed
> that it would probably be a good idea to start defining a data structure to
> better encapsulate DMA addresses.
> 
> It's just that nobody had time for that yet and/or I wasn't looped in in the
> final discussion about it.
> 
> Regards,
> Christian.
> 
> > -Sima

-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch