[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250905174324.GI616306@nvidia.com>
Date: Fri, 5 Sep 2025 14:43:24 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Marek Szyprowski <m.szyprowski@...sung.com>
Cc: Leon Romanovsky <leon@...nel.org>,
Abdiel Janulgue <abdiel.janulgue@...il.com>,
Alexander Potapenko <glider@...gle.com>,
Alex Gaynor <alex.gaynor@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Christoph Hellwig <hch@....de>, Danilo Krummrich <dakr@...nel.org>,
iommu@...ts.linux.dev, Jason Wang <jasowang@...hat.com>,
Jens Axboe <axboe@...nel.dk>, Joerg Roedel <joro@...tes.org>,
Jonathan Corbet <corbet@....net>, Juergen Gross <jgross@...e.com>,
kasan-dev@...glegroups.com, Keith Busch <kbusch@...nel.org>,
linux-block@...r.kernel.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
linux-nvme@...ts.infradead.org, linuxppc-dev@...ts.ozlabs.org,
linux-trace-kernel@...r.kernel.org,
Madhavan Srinivasan <maddy@...ux.ibm.com>,
Masami Hiramatsu <mhiramat@...nel.org>,
Michael Ellerman <mpe@...erman.id.au>,
"Michael S. Tsirkin" <mst@...hat.com>,
Miguel Ojeda <ojeda@...nel.org>,
Robin Murphy <robin.murphy@....com>, rust-for-linux@...r.kernel.org,
Sagi Grimberg <sagi@...mberg.me>,
Stefano Stabellini <sstabellini@...nel.org>,
Steven Rostedt <rostedt@...dmis.org>,
virtualization@...ts.linux.dev, Will Deacon <will@...nel.org>,
xen-devel@...ts.xenproject.org
Subject: Re: [PATCH v4 00/16] dma-mapping: migrate to physical address-based
API
On Fri, Sep 05, 2025 at 06:20:51PM +0200, Marek Szyprowski wrote:
> I've checked the most advertised use case in
> https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=dmabuf-vfio
> and I still don't see the reason why it cannot be based
> on dma_map_resource() API? I'm aware of the little asymmetry of the
> client calls is such case, indeed it is not preety, but this should work
> even now:
>
> phys = phys_vec[i].paddr;
>
> if (is_mmio)
> dma_map_resource(phys, len, ...);
> else
> dma_map_page(phys_to_page(phys), offset_in_page(phys), ...);
>
> What did I miss?
I have a somewhat different answer than Leon..
The link path would need a resource variation too:
+ ret = dma_iova_link(attachment->dev, state,
+ phys_vec[i].paddr, 0,
+ phys_vec[i].len, dir, attrs);
+ if (ret)
+ goto err_unmap_dma;
+
+ mapped_len += phys_vec[i].len;
It is an existing bug that we don't properly handle all details of
MMIO for link.
Since this is already a phys_addr_t I wouldn't strongly argue that
should be done by adding ATTR_MMIO to dma_iova_link().
If you did that, then you'd still want a dma_(un)map_phys() helper
that handled ATTR_MMIO too. It could be an inline "if () resource else
page" wrapper like you say.
So API wise I think we have the right design here.
I think the question you are asking is how much changing to the
internals of the DMA API do you want to do to make ATTR_MMIO.
It is not zero, but there is some minimum that is less than this.
So reason #1 much of this ATTR_MMIO is needed anyhow. Being consistent
and unifying the dma_map_resource path with ATTR_MMIO should improve
the long term maintainability of the code. We already uncovered paths
where map_resource is not behaving consistently with map_page and it
is unclear if these are bugs or deliberate.
Reason #2 we do actually want to get rid of struct page usage to help
advance Matthew's work. This means we want to build a clean struct
page less path for IO. Meaning we can do phys to virt, or kmap phys,
but none of: phys to page, page to virt, page to phys. Stopping at a
phys based public API and then leaving all the phys to page/etc
conversions hidden inside is not enough.
This is why I was looking at the dma_ops path, to see just how much
page usage there is, and I found very little. So this dream is
achievable and with this series we are there for ARM64 and x86
environments.
> This patchset focuses only on the dma_map_page -> dma_map_phys rework.
> There are also other interfaces, like dma_alloc_pages() and so far
> nothing has been proposed for them so far.
That's because they already have non-page alternatives.
Allmost all places call dma_alloc_noncoherent():
static inline void *dma_alloc_noncoherent(struct device *dev, size_t size,
dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp)
{
struct page *page = dma_alloc_pages(dev, size, dma_handle, dir, gfp);
return page ? page_address(page) : NULL;
Which is KVA based.
There is only one user I found of alloc_pages:
drivers/firewire/ohci.c: ctx->pages[i] = dma_alloc_pages(dev, PAGE_SIZE, &dma_addr,
And it deliberately uses page->private:
set_page_private(ctx->pages[i], dma_addr);
So it is correct to use the struct page API.
Some usages of dma_alloc_noncontiguous() can be implemented using the
dma_iova_link() flow like drivers/vfio/pci/mlx5/cmd.c shows by using
alloc_pages_bulk() for the allocator. We don't yet have a 'dma alloc
link' operation though, and there are only 4 users of
dma_alloc_noncontiguous()..
Jason
Powered by blists - more mailing lists