[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ad36ef4e-a485-4bbf-aaa9-67cd517ca018@amd.com>
Date: Wed, 19 Nov 2025 15:06:18 +0100
From: Christian König <christian.koenig@....com>
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: Leon Romanovsky <leon@...nel.org>, Bjorn Helgaas <bhelgaas@...gle.com>,
Logan Gunthorpe <logang@...tatee.com>, Jens Axboe <axboe@...nel.dk>,
Robin Murphy <robin.murphy@....com>, Joerg Roedel <joro@...tes.org>,
Will Deacon <will@...nel.org>, Marek Szyprowski <m.szyprowski@...sung.com>,
Andrew Morton <akpm@...ux-foundation.org>, Jonathan Corbet <corbet@....net>,
Sumit Semwal <sumit.semwal@...aro.org>, Kees Cook <kees@...nel.org>,
"Gustavo A. R. Silva" <gustavoars@...nel.org>,
Ankit Agrawal <ankita@...dia.com>, Yishai Hadas <yishaih@...dia.com>,
Shameer Kolothum <skolothumtho@...dia.com>, Kevin Tian
<kevin.tian@...el.com>, Alex Williamson <alex@...zbot.org>,
Krishnakant Jaju <kjaju@...dia.com>, Matt Ochs <mochs@...dia.com>,
linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-block@...r.kernel.org, iommu@...ts.linux.dev, linux-mm@...ck.org,
linux-doc@...r.kernel.org, linux-media@...r.kernel.org,
dri-devel@...ts.freedesktop.org, linaro-mm-sig@...ts.linaro.org,
kvm@...r.kernel.org, linux-hardening@...r.kernel.org
Subject: Re: [PATCH v8 05/11] PCI/P2PDMA: Document DMABUF model
On 11/19/25 14:35, Jason Gunthorpe wrote:
> On Wed, Nov 19, 2025 at 10:18:08AM +0100, Christian König wrote:
>>> +As this is not well-defined or well-supported in real HW the kernel defaults to
>>> +blocking such routing. There is an allow list to allow detecting known-good HW,
>>> +in which case P2P between any two PCIe devices will be permitted.
>>
>> That section sounds not correct to me.
>
> It is correct in that it describes what the kernel does right now.
>
> See calc_map_type_and_dist(), host_bridge_whitelist(), cpu_supports_p2pdma().
Well I'm the one who originally suggested that whitelist and the description still doesn't sound correct to me.
I would write something like "The PCIe specification doesn't define the forwarding of transactions between hierarchy domains...."
The previous text was actually much better than this summary since now it leaves out the important information where all of this is comes from.
What the kernel does can be figure out by reading the code, but we need to describe why it does it.
>
>> This is well supported in current HW, it's just not defined in some
>> official specification.
>
> Only AMD HW.
>
> Intel HW is a bit hit and miss.
>
> ARM SOCs are frequently not supporting even on server CPUs.
IIRC ARM actually has a validation program for this, but I've forgotten the name of it again.
Randy should know the name of it and I think mentioning the status of the vendors here would be a good idea.
>>> +At the lowest level the P2P subsystem offers a naked struct p2p_provider that
>>> +delegates lifecycle management to the providing driver. It is expected that
>>> +drivers using this option will wrap their MMIO memory in DMABUF and use DMABUF
>>> +to provide an invalidation shutdown.
>>
>>> These MMIO pages have no struct page, and
>>
>> Well please drop "pages" here. Just say MMIO addresses.
>
> "These MMIO addresses have no struct page, and"
+1
>
>>> +Building on this, the subsystem offers a layer to wrap the MMIO in a ZONE_DEVICE
>>> +pgmap of MEMORY_DEVICE_PCI_P2PDMA to create struct pages. The lifecycle of
>>> +pgmap ensures that when the pgmap is destroyed all other drivers have stopped
>>> +using the MMIO. This option works with O_DIRECT flows, in some cases, if the
>>> +underlying subsystem supports handling MEMORY_DEVICE_PCI_P2PDMA through
>>> +FOLL_PCI_P2PDMA. The use of FOLL_LONGTERM is prevented. As this relies on pgmap
>>> +it also relies on architecture support along with alignment and minimum size
>>> +limitations.
>>
>> Actually that is up to the exporter of the DMA-buf what approach is used.
>
> The above is not talking about DMA-buf, it is describing the existing
> interface that is all struct page based. The driver invoking the
> P2PDMA APIs gets to pick if it uses the stuct page based interface
> described above or the lower level p2p provider interface this series
> introduces.
>
>> For the P2PDMA API it should be irrelevant if struct pages are used or not.
>
> Only for the lowest level p2p provider based P2PDMA API - there is a
> higher level API family within P2PDMA's API that is all about creating
> and managing ZONE_DEVICE struct pages and a pgmap, the above describes
> that family.
I completely agree to all of this, but that's not what I meant.
The documentation makes it sound like DMA-buf is limited to not using struct pages and direct I/O, but that is not true.
You can have DMA-bufs backed by pages, both system memory and zone device pages.
But DMA-buf can also handle PCIe MMIO BARs which are micro controller doorbells or even classical HW registers.
Regards,
Christian.
>
> Thanks,
> Jason
Powered by blists - more mailing lists