[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z5BxI3efmCPSpOSK@yilunxu-OptiPlex-7050>
Date: Wed, 22 Jan 2025 12:16:35 +0800
From: Xu Yilun <yilun.xu@...ux.intel.com>
To: Christian König <christian.koenig@....com>
Cc: Jason Gunthorpe <jgg@...dia.com>, Christoph Hellwig <hch@....de>,
Leon Romanovsky <leonro@...dia.com>, kvm@...r.kernel.org,
dri-devel@...ts.freedesktop.org, linux-media@...r.kernel.org,
linaro-mm-sig@...ts.linaro.org, sumit.semwal@...aro.org,
pbonzini@...hat.com, seanjc@...gle.com, alex.williamson@...hat.com,
vivek.kasireddy@...el.com, dan.j.williams@...el.com, aik@....com,
yilun.xu@...el.com, linux-coco@...ts.linux.dev,
linux-kernel@...r.kernel.org, lukas@...ner.de, yan.y.zhao@...el.com,
leon@...nel.org, baolu.lu@...ux.intel.com, zhenzhong.duan@...el.com,
tao1.su@...el.com
Subject: Re: [RFC PATCH 01/12] dma-buf: Introduce dma_buf_get_pfn_unlocked()
kAPI
On Mon, Jan 20, 2025 at 02:44:13PM +0100, Christian König wrote:
> Am 21.06.24 um 00:02 schrieb Xu Yilun:
> > On Thu, Jan 16, 2025 at 04:13:13PM +0100, Christian König wrote:
> > > Am 15.01.25 um 18:09 schrieb Jason Gunthorpe:
> > >
> > > On Wed, Jan 15, 2025 at 05:34:23PM +0100, Christian König wrote:
> > >
> > > Granted, let me try to improve this.
> > > Here is a real world example of one of the issues we ran into and why
> > > CPU mappings of importers are redirected to the exporter.
> > > We have a good bunch of different exporters who track the CPU mappings
> > > of their backing store using address_space objects in one way or
> > > another and then uses unmap_mapping_range() to invalidate those CPU
> > > mappings.
> > > But when importers get the PFNs of the backing store they can look
> > > behind the curtain and directly insert this PFN into the CPU page
> > > tables.
> > > We had literally tons of cases like this where drivers developers cause
> > > access after free issues because the importer created a CPU mappings on
> > > their own without the exporter knowing about it.
> > > This is just one example of what we ran into. Additional to that
> > > basically the whole synchronization between drivers was overhauled as
> > > well because we found that we can't trust importers to always do the
> > > right thing.
> > >
> > > But this, fundamentally, is importers creating attachments and then
> > > *ignoring the lifetime rules of DMABUF*. If you created an attachment,
> > > got a move and *ignored the move* because you put the PFN in your own
> > > VMA, then you are not following the attachment lifetime rules!
> > >
> > > Move notify is solely for informing the importer that they need to
> > > re-fresh their DMA mappings and eventually block for ongoing DMA to end.
> > >
> > > This semantics doesn't work well for CPU mappings because you need to hold
> > > the reservation lock to make sure that the information stay valid and you
> > > can't hold a lock while returning from a page fault.
> > Dealing with CPU mapping and resource invalidation is a little hard, but is
> > resolvable, by using other types of locks. And I guess for now dma-buf
> > exporters should always handle this CPU mapping VS. invalidation contention if
> > they support mmap().
> >
> > It is resolvable so with some invalidation notify, a decent importers could
> > also handle the contention well.
>
> That doesn't work like this.
>
> See page tables updates under DMA-buf works by using the same locking
> approach for both the validation and invalidation side. In other words we
> hold the same lock while inserting and removing entries into/from the page
> tables.
Not sure what's the issue it causes, maybe I don't get why "you can't
hold a lock while returning from a page fault".
>
> That this here should be an unlocked API means that can only use it with
> pre-allocated and hard pinned memory without any chance to invalidate it
> while running. Otherwise you can never be sure of the validity of the
Then importers can use a locked version to get pfn, and manually use
dma_resv lock only to ensure the PFN validity during page table setup.
Importers could detect the PFN will be invalid via move notify and
remove page table entries. Then find the new PFN next time page fault
happens.
IIUC, Simona mentions drm/ttm is already doing it this way.
I'm not trying to change the CPU mmap things for existing drivers, just
to ensure importer mapping is possible with faultable MMU. I wanna KVM
MMU (also faultable) to work in this importer mapping way.
Thanks,
Yilun
Powered by blists - more mailing lists