[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251107160120.GD15456@unreal>
Date: Fri, 7 Nov 2025 18:01:20 +0200
From: Leon Romanovsky <leon@...nel.org>
To: Randy Dunlap <rdunlap@...radead.org>
Cc: Bjorn Helgaas <bhelgaas@...gle.com>,
Logan Gunthorpe <logang@...tatee.com>, Jens Axboe <axboe@...nel.dk>,
Robin Murphy <robin.murphy@....com>, Joerg Roedel <joro@...tes.org>,
Will Deacon <will@...nel.org>,
Marek Szyprowski <m.szyprowski@...sung.com>,
Jason Gunthorpe <jgg@...pe.ca>,
Andrew Morton <akpm@...ux-foundation.org>,
Jonathan Corbet <corbet@....net>,
Sumit Semwal <sumit.semwal@...aro.org>,
Christian König <christian.koenig@....com>,
Kees Cook <kees@...nel.org>,
"Gustavo A. R. Silva" <gustavoars@...nel.org>,
Ankit Agrawal <ankita@...dia.com>,
Yishai Hadas <yishaih@...dia.com>,
Shameer Kolothum <skolothumtho@...dia.com>,
Kevin Tian <kevin.tian@...el.com>,
Alex Williamson <alex@...zbot.org>,
Krishnakant Jaju <kjaju@...dia.com>, Matt Ochs <mochs@...dia.com>,
linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-block@...r.kernel.org, iommu@...ts.linux.dev,
linux-mm@...ck.org, linux-doc@...r.kernel.org,
linux-media@...r.kernel.org, dri-devel@...ts.freedesktop.org,
linaro-mm-sig@...ts.linaro.org, kvm@...r.kernel.org,
linux-hardening@...r.kernel.org
Subject: Re: [PATCH v7 05/11] PCI/P2PDMA: Document DMABUF model
On Thu, Nov 06, 2025 at 10:15:07PM -0800, Randy Dunlap wrote:
>
>
> On 11/6/25 6:16 AM, Leon Romanovsky wrote:
> > From: Jason Gunthorpe <jgg@...dia.com>
> >
> > Reflect latest changes in p2p implementation to support DMABUF lifecycle.
> >
> > Signed-off-by: Leon Romanovsky <leonro@...dia.com>
> > Signed-off-by: Jason Gunthorpe <jgg@...dia.com>
> > ---
> > Documentation/driver-api/pci/p2pdma.rst | 95 +++++++++++++++++++++++++--------
> > 1 file changed, 72 insertions(+), 23 deletions(-)
> >
> > diff --git a/Documentation/driver-api/pci/p2pdma.rst b/Documentation/driver-api/pci/p2pdma.rst
> > index d0b241628cf1..69adea45f73e 100644
> > --- a/Documentation/driver-api/pci/p2pdma.rst
> > +++ b/Documentation/driver-api/pci/p2pdma.rst
> > @@ -9,22 +9,47 @@ between two devices on the bus. This type of transaction is henceforth
> > called Peer-to-Peer (or P2P). However, there are a number of issues that
> > make P2P transactions tricky to do in a perfectly safe way.
> >
> > -One of the biggest issues is that PCI doesn't require forwarding
> > -transactions between hierarchy domains, and in PCIe, each Root Port
> > -defines a separate hierarchy domain. To make things worse, there is no
> > -simple way to determine if a given Root Complex supports this or not.
> > -(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel
> > -only supports doing P2P when the endpoints involved are all behind the
> > -same PCI bridge, as such devices are all in the same PCI hierarchy
> > -domain, and the spec guarantees that all transactions within the
> > -hierarchy will be routable, but it does not require routing
> > -between hierarchies.
> > -
> > -The second issue is that to make use of existing interfaces in Linux,
> > -memory that is used for P2P transactions needs to be backed by struct
> > -pages. However, PCI BARs are not typically cache coherent so there are
> > -a few corner case gotchas with these pages so developers need to
> > -be careful about what they do with them.
> > +For PCIe the routing of TLPs is well defined up until they reach a host bridge
>
> Define what TLP means?
In PCIe "world", TLP is very well-known and well-defined acronym, which
means Transaction Layer Packet.
> well-defined
Thanks
diff --git a/Documentation/driver-api/pci/p2pdma.rst b/Documentation/driver-api/pci/p2pdma.rst
index 69adea45f73e..7530296a5dea 100644
--- a/Documentation/driver-api/pci/p2pdma.rst
+++ b/Documentation/driver-api/pci/p2pdma.rst
@@ -9,17 +9,17 @@ between two devices on the bus. This type of transaction is henceforth
called Peer-to-Peer (or P2P). However, there are a number of issues that
make P2P transactions tricky to do in a perfectly safe way.
-For PCIe the routing of TLPs is well defined up until they reach a host bridge
-or root port. If the path includes PCIe switches then based on the ACS settings
-the transaction can route entirely within the PCIe hierarchy and never reach the
-root port. The kernel will evaluate the PCIe topology and always permit P2P
-in these well defined cases.
+For PCIe the routing of Transaction Layer Packets (TLPs) is well-defined up
+until they reach a host bridge or root port. If the path includes PCIe switches
+then based on the ACS settings the transaction can route entirely within
+the PCIe hierarchy and never reach the root port. The kernel will evaluate
+the PCIe topology and always permit P2P in these well-defined cases.
However, if the P2P transaction reaches the host bridge then it might have to
hairpin back out the same root port, be routed inside the CPU SOC to another
PCIe root port, or routed internally to the SOC.
-As this is not well defined or well supported in real HW the kernel defaults to
+As this is not well-defined or well supported in real HW the kernel defaults to
blocking such routing. There is an allow list to allow detecting known-good HW,
in which case P2P between any two PCIe devices will be permitted.
@@ -39,7 +39,7 @@ delegates lifecycle management to the providing driver. It is expected that
drivers using this option will wrap their MMIO memory in DMABUF and use DMABUF
to provide an invalidation shutdown. These MMIO pages have no struct page, and
if used with mmap() must create special PTEs. As such there are very few
-kernel uAPIs that can accept pointers to them, in particular they cannot be used
+kernel uAPIs that can accept pointers to them; in particular they cannot be used
with read()/write(), including O_DIRECT.
Building on this, the subsystem offers a layer to wrap the MMIO in a ZONE_DEVICE
@@ -154,7 +154,7 @@ access happens.
Usage With DMABUF
=================
-DMABUF provides an alternative to the above struct page based
+DMABUF provides an alternative to the above struct page-based
client/provider/orchestrator system. In this mode the exporting driver will wrap
some of its MMIO in a DMABUF and give the DMABUF FD to userspace.
@@ -162,10 +162,10 @@ Userspace can then pass the FD to an importing driver which will ask the
exporting driver to map it.
In this case the initiator and target pci_devices are known and the P2P subsystem
-is used to determine the mapping type. The phys_addr_t based DMA API is used to
+is used to determine the mapping type. The phys_addr_t-based DMA API is used to
establish the dma_addr_t.
-Lifecycle is controlled by DMABUF move_notify(), when the exporting driver wants
+Lifecycle is controlled by DMABUF move_notify(). When the exporting driver wants
to remove() it must deliver an invalidation shutdown to all DMABUF importing
drivers through move_notify() and synchronously DMA unmap all the MMIO.
Powered by blists - more mailing lists