[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250905173850.GB25881@unreal>
Date: Fri, 5 Sep 2025 20:38:50 +0300
From: Leon Romanovsky <leon@...nel.org>
To: Marek Szyprowski <m.szyprowski@...sung.com>
Cc: Jason Gunthorpe <jgg@...dia.com>,
Abdiel Janulgue <abdiel.janulgue@...il.com>,
Alexander Potapenko <glider@...gle.com>,
Alex Gaynor <alex.gaynor@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Christoph Hellwig <hch@....de>, Danilo Krummrich <dakr@...nel.org>,
iommu@...ts.linux.dev, Jason Wang <jasowang@...hat.com>,
Jens Axboe <axboe@...nel.dk>, Joerg Roedel <joro@...tes.org>,
Jonathan Corbet <corbet@....net>, Juergen Gross <jgross@...e.com>,
kasan-dev@...glegroups.com, Keith Busch <kbusch@...nel.org>,
linux-block@...r.kernel.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
linux-nvme@...ts.infradead.org, linuxppc-dev@...ts.ozlabs.org,
linux-trace-kernel@...r.kernel.org,
Madhavan Srinivasan <maddy@...ux.ibm.com>,
Masami Hiramatsu <mhiramat@...nel.org>,
Michael Ellerman <mpe@...erman.id.au>,
"Michael S. Tsirkin" <mst@...hat.com>,
Miguel Ojeda <ojeda@...nel.org>,
Robin Murphy <robin.murphy@....com>, rust-for-linux@...r.kernel.org,
Sagi Grimberg <sagi@...mberg.me>,
Stefano Stabellini <sstabellini@...nel.org>,
Steven Rostedt <rostedt@...dmis.org>,
virtualization@...ts.linux.dev, Will Deacon <will@...nel.org>,
xen-devel@...ts.xenproject.org
Subject: Re: [PATCH v4 00/16] dma-mapping: migrate to physical address-based
API
On Fri, Sep 05, 2025 at 06:20:51PM +0200, Marek Szyprowski wrote:
> On 29.08.2025 15:16, Jason Gunthorpe wrote:
> > On Tue, Aug 19, 2025 at 08:36:44PM +0300, Leon Romanovsky wrote:
> >
> >> This series does the core code and modern flows. A followup series
> >> will give the same treatment to the legacy dma_ops implementation.
> > I took a quick check over this to see that it is sane. I think using
> > phys is an improvement for most of the dma_ops implemenations.
> >
> > arch/sparc/kernel/pci_sun4v.c
> > arch/sparc/kernel/iommu.c
> > Uses __pa to get phys from the page, never touches page
> >
> > arch/alpha/kernel/pci_iommu.c
> > arch/sparc/mm/io-unit.c
> > drivers/parisc/ccio-dma.c
> > drivers/parisc/sba_iommu.c
> > Does page_addres() and later does __pa on it. Doesn't touch struct page
> >
> > arch/x86/kernel/amd_gart_64.c
> > drivers/xen/swiotlb-xen.c
> > arch/mips/jazz/jazzdma.c
> > Immediately does page_to_phys(), never touches struct page
> >
> > drivers/vdpa/vdpa_user/vduse_dev.c
> > Does page_to_phys() to call iommu_map()
> >
> > drivers/xen/grant-dma-ops.c
> > Does page_to_pfn() and nothing else
> >
> > arch/powerpc/platforms/ps3/system-bus.c
> > This is a maze but I think it wants only phys and the virt is only
> > used for debug prints.
> >
> > The above all never touch a KVA and just want a phys_addr_t.
> >
> > The below are touching the KVA somehow:
> >
> > arch/sparc/mm/iommu.c
> > arch/arm/mm/dma-mapping.c
> > Uses page_address to cache flush, would be happy with phys_to_virt()
> > and a PhysHighMem()
> >
> > arch/powerpc/kernel/dma-iommu.c
> > arch/powerpc/platforms/pseries/vio.c
> > Uses iommu_map_page() which wants phys_to_virt(), doesn't touch
> > struct page
> >
> > arch/powerpc/platforms/pseries/ibmebus.c
> > Returns phys_to_virt() as dma_addr_t.
> >
> > The two PPC ones are weird, I didn't figure out how that was working..
> >
> > It would be easy to make map_phys patches for about half of these, in
> > the first grouping. Doing so would also grant those arches
> > map_resource capability.
> >
> > Overall I didn't think there was any reduction in maintainability in
> > these places. Most are improvements eliminating code, and some are
> > just switching to phys_to_virt() from page_address(), which we could
> > further guard with DMA_ATTR_MMIO and a check for highmem.
>
> Thanks for this summary.
>
> However I would still like to get an answer for the simple question -
> why all this work cannot be replaced by a simple use of dma_map_resource()?
>
> I've checked the most advertised use case in
> https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=dmabuf-vfio
> and I still don't see the reason why it cannot be based
> on dma_map_resource() API? I'm aware of the little asymmetry of the
> client calls is such case, indeed it is not preety, but this should work
> even now:
>
> phys = phys_vec[i].paddr;
>
> if (is_mmio)
> dma_map_resource(phys, len, ...);
> else
> dma_map_page(phys_to_page(phys), offset_in_page(phys), ...);
>
> What did I miss?
"Even now" can't work mainly because both of these interfaces don't
support p2p case (PCI_P2PDMA_MAP_BUS_ADDR).
It is unclear how to extend them without introducing new functions
and/or changing whole kernel. In PCI_P2PDMA_MAP_BUS_ADDR case, there
is no struct page, so dma_map_page() is unlikely to be possible to
extend and dma_map_resource() has no direct way to access PCI
bus_offset. In theory, it is doable, but will be layer violation as DMA
will need to rely on PCI layer for address calculations.
If we don't extend, in general case (for HMM, RDMA and NVMe) end result will be something like that:
if (...PCI_P2PDMA_MAP_BUS_ADDR)
pci_p2pdma_bus_addr_map
else if (mmio)
dma_map_resource
else <- this case is not applicable to VFIO-DMABUF
dma_map_page
In case, we will somehow extend these functions to support it, we will
lose very important optimization where we are performing one IOTLB
sync for whole DMABUF region == dma_iova_state, and I was told that
it is very large region.
103 for (i = 0; i < priv->nr_ranges; i++) {
<...>
107 } else if (dma_use_iova(state)) {
108 ret = dma_iova_link(attachment->dev, state,
109 phys_vec[i].paddr, 0,
110 phys_vec[i].len, dir, attrs);
111 if (ret)
112 goto err_unmap_dma;
113
114 mapped_len += phys_vec[i].len;
<...>
132 }
133
134 if (state && dma_use_iova(state)) {
135 WARN_ON_ONCE(mapped_len != priv->size);
136 ret = dma_iova_sync(attachment->dev, state, 0, mapped_len);
>
> I'm not against this rework, but I would really like to know the
> rationale. I know that the 2-step dma-mapping API also use phys
> addresses and this is the same direction.
This series is continuation of 2-step dma-mapping API. The plan to
provide dma_map_phys() was from the beginning.
Thanks
Powered by blists - more mailing lists