[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251020150412.GP6199@unreal>
Date: Mon, 20 Oct 2025 18:04:12 +0300
From: Leon Romanovsky <leon@...nel.org>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: Christoph Hellwig <hch@...radead.org>,
Alex Williamson <alex.williamson@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Bjorn Helgaas <bhelgaas@...gle.com>,
Christian König <christian.koenig@....com>,
dri-devel@...ts.freedesktop.org, iommu@...ts.linux.dev,
Jens Axboe <axboe@...nel.dk>, Joerg Roedel <joro@...tes.org>,
kvm@...r.kernel.org, linaro-mm-sig@...ts.linaro.org,
linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-media@...r.kernel.org, linux-mm@...ck.org,
linux-pci@...r.kernel.org, Logan Gunthorpe <logang@...tatee.com>,
Marek Szyprowski <m.szyprowski@...sung.com>,
Robin Murphy <robin.murphy@....com>,
Sumit Semwal <sumit.semwal@...aro.org>,
Vivek Kasireddy <vivek.kasireddy@...el.com>,
Will Deacon <will@...nel.org>
Subject: Re: [PATCH v5 1/9] PCI/P2PDMA: Separate the mmap() support from the
core logic
On Mon, Oct 20, 2025 at 09:58:54AM -0300, Jason Gunthorpe wrote:
> On Mon, Oct 20, 2025 at 05:27:02AM -0700, Christoph Hellwig wrote:
> > On Fri, Oct 17, 2025 at 08:53:20AM -0300, Jason Gunthorpe wrote:
> > > On Thu, Oct 16, 2025 at 11:30:06PM -0700, Christoph Hellwig wrote:
> > > > On Mon, Oct 13, 2025 at 06:26:03PM +0300, Leon Romanovsky wrote:
> > > > > The DMA API now has a new flow, and has gained phys_addr_t support, so
> > > > > it no longer needs struct pages to perform P2P mapping.
> > > >
> > > > That's news to me. All the pci_p2pdma_map_state machinery is still
> > > > based on pgmaps and thus pages.
> > >
> > > We had this discussion already three months ago:
> > >
> > > https://lore.kernel.org/all/20250729131502.GJ36037@nvidia.com/
> > >
> > > These couple patches make the core pci_p2pdma_map_state machinery work
> > > on struct p2pdma_provider, and pgmap is just one way to get a
> > > p2pdma_provider *
> > >
> > > The struct page paths through pgmap go page->pgmap->mem to get
> > > p2pdma_provider.
> > >
> > > The non-struct page paths just have a p2pdma_provider * without a
> > > pgmap. In this series VFIO uses
> > >
> > > + *provider = pcim_p2pdma_provider(pdev, bar);
> > >
> > > To get the provider for a specific BAR.
> >
> > And what protects that life time? I've not seen anyone actually
> > building the proper lifetime management. And if someone did the patches
> > need to clearly point to that.
>
> It is this series!
>
> The above API gives a lifetime that is driver bound. The calling
> driver must ensure it stops using provider and stops doing DMA with it
> before remove() completes.
>
> This VFIO series does that through the move_notify callchain I showed
> in the previous email. This callchain is always triggered before
> remove() of the VFIO PCI driver is completed.
>
> > > I think I've answered this three times now - for DMABUF the DMABUF
> > > invalidation scheme is used to control the lifetime and no DMA mapping
> > > outlives the provider, and the provider doesn't outlive the driver.
> >
> > How?
>
> I explained it in detail in the message you are repling to. If
> something is not clear can you please be more specific??
>
> Is it the mmap in VFIO perhaps that is causing these questions?
>
> VFIO uses a PFNMAP VMA, so you can't pin_user_page() it. It uses
> unmap_mapping_range() during its remove() path to get rid of the VMA
> PTEs.
>
> The DMA activity doesn't use the mmap *at all*. It isn't like NVMe
> which relies on the ZONE_DEVICE pages and VMAs to link drivers
> togther.
>
> Instead the DMABUF FD is used to pass the MMIO pages between VFIO and
> another driver. DMABUF has a built in invalidation mechanism that VFIO
> triggers before remove(). The invalidation removes access from the
> other driver.
>
> This is different than NVMe which has no invalidation. NVMe does
> unmap_mapping_range() on the VMA and waits for all the short lived
> pgmap references to clear. We don't need anything like that because
> DMABUF invalidation is synchronous.
>
> The full picture for VFIO is something like:
>
> [startup]
> MMIO is acquired from the pci_resource
> p2p_providers are setup
>
> [runtime]
> MMIO is mapped into PFNMAP VMAs
> MMIO is linked to a DMABUF FD
> DMABUF FD gets DMA mapped using the p2p_provider
>
> [unplug]
> unmap_mapping_range() is called so all VMAs are emptied out and the
> fault handler prevents new PTEs
> ** No access to the MMIO through VMAs is possible**
>
> vfio_pci_dma_buf_cleanup() is called which prevents new DMABUF
> mappings from starting, and does dma_buf_move_notify() on all the
> open DMABUF FDs to invalidate other drivers. Other drivers stop
> doing DMA and we need to free the IOVA from the IOMMU/etc.
> ** No DMA access from other drivers is possible now**
>
> Any still open DMABUF FD will fail inside VFIO immediately due to
> the priv->revoked checks.
> **No code touches the p2p_provider anymore**
>
> The p2p_provider is destroyed by devm.
>
> > > Obviously you cannot use the new p2provider mechanism without some
> > > kind of protection against use after hot unplug, but it doesn't have
> > > to be struct page based.
> >
> > And how does this interact with everyone else expecting pgmap based
> > lifetime management.
>
> They continue to use pgmap and nothing changes for them.
>
> The pgmap path always waited until nothing was using the pgmap and
> thus provider before allowing device driver remove() to complete.
>
> The refactoring doesn't change the lifecycle model, it just provides
> entry points to access the driver bound lifetime model directly
> instead of being forced to use pgmap.
>
> Leon, can you add some remarks to the comments about what the rules
> are to call pcim_p2pdma_provider() ?
Yes, sure.
Thanks
>
> Jason
Powered by blists - more mailing lists