[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251119145007.GJ18335@unreal>
Date: Wed, 19 Nov 2025 16:50:07 +0200
From: Leon Romanovsky <leon@...nel.org>
To: Christian König <christian.koenig@....com>
Cc: Bjorn Helgaas <bhelgaas@...gle.com>,
Logan Gunthorpe <logang@...tatee.com>, Jens Axboe <axboe@...nel.dk>,
Robin Murphy <robin.murphy@....com>, Joerg Roedel <joro@...tes.org>,
Will Deacon <will@...nel.org>,
Marek Szyprowski <m.szyprowski@...sung.com>,
Jason Gunthorpe <jgg@...pe.ca>,
Andrew Morton <akpm@...ux-foundation.org>,
Jonathan Corbet <corbet@....net>,
Sumit Semwal <sumit.semwal@...aro.org>, Kees Cook <kees@...nel.org>,
"Gustavo A. R. Silva" <gustavoars@...nel.org>,
Ankit Agrawal <ankita@...dia.com>,
Yishai Hadas <yishaih@...dia.com>,
Shameer Kolothum <skolothumtho@...dia.com>,
Kevin Tian <kevin.tian@...el.com>,
Alex Williamson <alex@...zbot.org>,
Krishnakant Jaju <kjaju@...dia.com>, Matt Ochs <mochs@...dia.com>,
linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-block@...r.kernel.org, iommu@...ts.linux.dev,
linux-mm@...ck.org, linux-doc@...r.kernel.org,
linux-media@...r.kernel.org, dri-devel@...ts.freedesktop.org,
linaro-mm-sig@...ts.linaro.org, kvm@...r.kernel.org,
linux-hardening@...r.kernel.org, Alex Mastro <amastro@...com>,
Nicolin Chen <nicolinc@...dia.com>
Subject: Re: [Linaro-mm-sig] [PATCH v8 06/11] dma-buf: provide phys_vec to
scatter-gather mapping routine
On Wed, Nov 19, 2025 at 03:11:01PM +0100, Christian König wrote:
> On 11/19/25 14:42, Leon Romanovsky wrote:
> > On Wed, Nov 19, 2025 at 02:16:57PM +0100, Christian König wrote:
> >>
> >>
> >> On 11/11/25 10:57, Leon Romanovsky wrote:
> >>> From: Leon Romanovsky <leonro@...dia.com>
> >>>
> >>> Add dma_buf_map() and dma_buf_unmap() helpers to convert an array of
> >>> MMIO physical address ranges into scatter-gather tables with proper
> >>> DMA mapping.
> >>>
> >>> These common functions are a starting point and support any PCI
> >>> drivers creating mappings from their BAR's MMIO addresses. VFIO is one
> >>> case, as shortly will be RDMA. We can review existing DRM drivers to
> >>> refactor them separately. We hope this will evolve into routines to
> >>> help common DRM that include mixed CPU and MMIO mappings.
> >>>
> >>> Compared to the dma_map_resource() abuse this implementation handles
> >>> the complicated PCI P2P scenarios properly, especially when an IOMMU
> >>> is enabled:
> >>>
> >>> - Direct bus address mapping without IOVA allocation for
> >>> PCI_P2PDMA_MAP_BUS_ADDR, using pci_p2pdma_bus_addr_map(). This
> >>> happens if the IOMMU is enabled but the PCIe switch ACS flags allow
> >>> transactions to avoid the host bridge.
> >>>
> >>> Further, this handles the slightly obscure, case of MMIO with a
> >>> phys_addr_t that is different from the physical BAR programming
> >>> (bus offset). The phys_addr_t is converted to a dma_addr_t and
> >>> accommodates this effect. This enables certain real systems to
> >>> work, especially on ARM platforms.
> >>>
> >>> - Mapping through host bridge with IOVA allocation and DMA_ATTR_MMIO
> >>> attribute for MMIO memory regions (PCI_P2PDMA_MAP_THRU_HOST_BRIDGE).
> >>> This happens when the IOMMU is enabled and the ACS flags are forcing
> >>> all traffic to the IOMMU - ie for virtualization systems.
> >>>
> >>> - Cases where P2P is not supported through the host bridge/CPU. The
> >>> P2P subsystem is the proper place to detect this and block it.
> >>>
> >>> Helper functions fill_sg_entry() and calc_sg_nents() handle the
> >>> scatter-gather table construction, splitting large regions into
> >>> UINT_MAX-sized chunks to fit within sg->length field limits.
> >>>
> >>> Since the physical address based DMA API forbids use of the CPU list
> >>> of the scatterlist this will produce a mangled scatterlist that has
> >>> a fully zero-length and NULL'd CPU list. The list is 0 length,
> >>> all the struct page pointers are NULL and zero sized. This is stronger
> >>> and more robust than the existing mangle_sg_table() technique. It is
> >>> a future project to migrate DMABUF as a subsystem away from using
> >>> scatterlist for this data structure.
> >>>
> >>> Tested-by: Alex Mastro <amastro@...com>
> >>> Tested-by: Nicolin Chen <nicolinc@...dia.com>
> >>> Signed-off-by: Leon Romanovsky <leonro@...dia.com>
> >>> ---
> >>> drivers/dma-buf/dma-buf.c | 235 ++++++++++++++++++++++++++++++++++++++++++++++
> >>> include/linux/dma-buf.h | 18 ++++
> >>> 2 files changed, 253 insertions(+)
> >>>
> >>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> >>> index 2bcf9ceca997..cb55dff1dad5 100644
> >>> --- a/drivers/dma-buf/dma-buf.c
> >>> +++ b/drivers/dma-buf/dma-buf.c
> >>> @@ -1254,6 +1254,241 @@ void dma_buf_unmap_attachment_unlocked(struct dma_buf_attachment *attach,
> >>> }
> >>> EXPORT_SYMBOL_NS_GPL(dma_buf_unmap_attachment_unlocked, "DMA_BUF");
> >>>
> >>> +static struct scatterlist *fill_sg_entry(struct scatterlist *sgl, size_t length,
> >>> + dma_addr_t addr)
> >>> +{
> >>> + unsigned int len, nents;
> >>> + int i;
> >>> +
> >>> + nents = DIV_ROUND_UP(length, UINT_MAX);
> >>> + for (i = 0; i < nents; i++) {
> >>> + len = min_t(size_t, length, UINT_MAX);
> >>> + length -= len;
> >>> + /*
> >>> + * DMABUF abuses scatterlist to create a scatterlist
> >>> + * that does not have any CPU list, only the DMA list.
> >>> + * Always set the page related values to NULL to ensure
> >>> + * importers can't use it. The phys_addr based DMA API
> >>> + * does not require the CPU list for mapping or unmapping.
> >>> + */
> >>> + sg_set_page(sgl, NULL, 0, 0);
> >>> + sg_dma_address(sgl) = addr + i * UINT_MAX;
> >>> + sg_dma_len(sgl) = len;
> >>> + sgl = sg_next(sgl);
> >>> + }
> >>> +
> >>> + return sgl;
> >>> +}
> >>> +
> >>> +static unsigned int calc_sg_nents(struct dma_iova_state *state,
> >>> + struct dma_buf_phys_vec *phys_vec,
> >>> + size_t nr_ranges, size_t size)
> >>> +{
> >>> + unsigned int nents = 0;
> >>> + size_t i;
> >>> +
> >>> + if (!state || !dma_use_iova(state)) {
> >>> + for (i = 0; i < nr_ranges; i++)
> >>> + nents += DIV_ROUND_UP(phys_vec[i].len, UINT_MAX);
> >>> + } else {
> >>> + /*
> >>> + * In IOVA case, there is only one SG entry which spans
> >>> + * for whole IOVA address space, but we need to make sure
> >>> + * that it fits sg->length, maybe we need more.
> >>> + */
> >>> + nents = DIV_ROUND_UP(size, UINT_MAX);
> >>> + }
> >>> +
> >>> + return nents;
> >>> +}
> >>> +
> >>> +/**
> >>> + * struct dma_buf_dma - holds DMA mapping information
> >>> + * @sgt: Scatter-gather table
> >>> + * @state: DMA IOVA state relevant in IOMMU-based DMA
> >>> + * @size: Total size of DMA transfer
> >>> + */
> >>> +struct dma_buf_dma {
> >>> + struct sg_table sgt;
> >>> + struct dma_iova_state *state;
> >>> + size_t size;
> >>> +};
> >>> +
> >>> +/**
> >>> + * dma_buf_map - Returns the scatterlist table of the attachment from arrays
> >>> + * of physical vectors. This funciton is intended for MMIO memory only.
> >>> + * @attach: [in] attachment whose scatterlist is to be returned
> >>> + * @provider: [in] p2pdma provider
> >>> + * @phys_vec: [in] array of physical vectors
> >>> + * @nr_ranges: [in] number of entries in phys_vec array
> >>> + * @size: [in] total size of phys_vec
> >>> + * @dir: [in] direction of DMA transfer
> >>> + *
> >>> + * Returns sg_table containing the scatterlist to be returned; returns ERR_PTR
> >>> + * on error. May return -EINTR if it is interrupted by a signal.
> >>> + *
> >>> + * On success, the DMA addresses and lengths in the returned scatterlist are
> >>> + * PAGE_SIZE aligned.
> >>> + *
> >>> + * A mapping must be unmapped by using dma_buf_unmap().
> >>> + */
> >>> +struct sg_table *dma_buf_map(struct dma_buf_attachment *attach,
> >>
> >> That is clearly not a good name for this function. We already have overloaded the term *mapping* with something completely different.
> >
> > This function performs DMA mapping, so what name do you suggest instead of dma_buf_map()?
>
> Something like dma_buf_phys_vec_to_sg_table(). I'm not good at naming either.
Can I call it simply dma_buf_mapping() as I plan to put that function in dma_buf_mapping.c
file per-your request.
Regarding SG, the long term plan is to remove SG table completely, so at
least external users of DMABUF shouldn't be exposed to internal implementation
details (SG table).
Thanks
Powered by blists - more mailing lists