[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251020125854.GL316284@nvidia.com>
Date: Mon, 20 Oct 2025 09:58:54 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Christoph Hellwig <hch@...radead.org>
Cc: Leon Romanovsky <leon@...nel.org>,
Alex Williamson <alex.williamson@...hat.com>,
Leon Romanovsky <leonro@...dia.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Bjorn Helgaas <bhelgaas@...gle.com>,
Christian König <christian.koenig@....com>,
dri-devel@...ts.freedesktop.org, iommu@...ts.linux.dev,
Jens Axboe <axboe@...nel.dk>, Joerg Roedel <joro@...tes.org>,
kvm@...r.kernel.org, linaro-mm-sig@...ts.linaro.org,
linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-media@...r.kernel.org, linux-mm@...ck.org,
linux-pci@...r.kernel.org, Logan Gunthorpe <logang@...tatee.com>,
Marek Szyprowski <m.szyprowski@...sung.com>,
Robin Murphy <robin.murphy@....com>,
Sumit Semwal <sumit.semwal@...aro.org>,
Vivek Kasireddy <vivek.kasireddy@...el.com>,
Will Deacon <will@...nel.org>
Subject: Re: [PATCH v5 1/9] PCI/P2PDMA: Separate the mmap() support from the
core logic
On Mon, Oct 20, 2025 at 05:27:02AM -0700, Christoph Hellwig wrote:
> On Fri, Oct 17, 2025 at 08:53:20AM -0300, Jason Gunthorpe wrote:
> > On Thu, Oct 16, 2025 at 11:30:06PM -0700, Christoph Hellwig wrote:
> > > On Mon, Oct 13, 2025 at 06:26:03PM +0300, Leon Romanovsky wrote:
> > > > The DMA API now has a new flow, and has gained phys_addr_t support, so
> > > > it no longer needs struct pages to perform P2P mapping.
> > >
> > > That's news to me. All the pci_p2pdma_map_state machinery is still
> > > based on pgmaps and thus pages.
> >
> > We had this discussion already three months ago:
> >
> > https://lore.kernel.org/all/20250729131502.GJ36037@nvidia.com/
> >
> > These couple patches make the core pci_p2pdma_map_state machinery work
> > on struct p2pdma_provider, and pgmap is just one way to get a
> > p2pdma_provider *
> >
> > The struct page paths through pgmap go page->pgmap->mem to get
> > p2pdma_provider.
> >
> > The non-struct page paths just have a p2pdma_provider * without a
> > pgmap. In this series VFIO uses
> >
> > + *provider = pcim_p2pdma_provider(pdev, bar);
> >
> > To get the provider for a specific BAR.
>
> And what protects that life time? I've not seen anyone actually
> building the proper lifetime management. And if someone did the patches
> need to clearly point to that.
It is this series!
The above API gives a lifetime that is driver bound. The calling
driver must ensure it stops using provider and stops doing DMA with it
before remove() completes.
This VFIO series does that through the move_notify callchain I showed
in the previous email. This callchain is always triggered before
remove() of the VFIO PCI driver is completed.
> > I think I've answered this three times now - for DMABUF the DMABUF
> > invalidation scheme is used to control the lifetime and no DMA mapping
> > outlives the provider, and the provider doesn't outlive the driver.
>
> How?
I explained it in detail in the message you are repling to. If
something is not clear can you please be more specific??
Is it the mmap in VFIO perhaps that is causing these questions?
VFIO uses a PFNMAP VMA, so you can't pin_user_page() it. It uses
unmap_mapping_range() during its remove() path to get rid of the VMA
PTEs.
The DMA activity doesn't use the mmap *at all*. It isn't like NVMe
which relies on the ZONE_DEVICE pages and VMAs to link drivers
togther.
Instead the DMABUF FD is used to pass the MMIO pages between VFIO and
another driver. DMABUF has a built in invalidation mechanism that VFIO
triggers before remove(). The invalidation removes access from the
other driver.
This is different than NVMe which has no invalidation. NVMe does
unmap_mapping_range() on the VMA and waits for all the short lived
pgmap references to clear. We don't need anything like that because
DMABUF invalidation is synchronous.
The full picture for VFIO is something like:
[startup]
MMIO is acquired from the pci_resource
p2p_providers are setup
[runtime]
MMIO is mapped into PFNMAP VMAs
MMIO is linked to a DMABUF FD
DMABUF FD gets DMA mapped using the p2p_provider
[unplug]
unmap_mapping_range() is called so all VMAs are emptied out and the
fault handler prevents new PTEs
** No access to the MMIO through VMAs is possible**
vfio_pci_dma_buf_cleanup() is called which prevents new DMABUF
mappings from starting, and does dma_buf_move_notify() on all the
open DMABUF FDs to invalidate other drivers. Other drivers stop
doing DMA and we need to free the IOVA from the IOMMU/etc.
** No DMA access from other drivers is possible now**
Any still open DMABUF FD will fail inside VFIO immediately due to
the priv->revoked checks.
**No code touches the p2p_provider anymore**
The p2p_provider is destroyed by devm.
> > Obviously you cannot use the new p2provider mechanism without some
> > kind of protection against use after hot unplug, but it doesn't have
> > to be struct page based.
>
> And how does this interact with everyone else expecting pgmap based
> lifetime management.
They continue to use pgmap and nothing changes for them.
The pgmap path always waited until nothing was using the pgmap and
thus provider before allowing device driver remove() to complete.
The refactoring doesn't change the lifecycle model, it just provides
entry points to access the driver bound lifetime model directly
instead of being forced to use pgmap.
Leon, can you add some remarks to the comments about what the rules
are to call pcim_p2pdma_provider() ?
Jason
Powered by blists - more mailing lists