[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250727061615.GW402218@unreal>
Date: Sun, 27 Jul 2025 09:16:15 +0300
From: Leon Romanovsky <leon@...nel.org>
To: "Kasireddy, Vivek" <vivek.kasireddy@...el.com>
Cc: Alex Williamson <alex.williamson@...hat.com>,
Christoph Hellwig <hch@....de>, Jason Gunthorpe <jgg@...dia.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Bjorn Helgaas <bhelgaas@...gle.com>,
Christian König <christian.koenig@....com>,
"dri-devel@...ts.freedesktop.org" <dri-devel@...ts.freedesktop.org>,
"iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
Jens Axboe <axboe@...nel.dk>,
Jérôme Glisse <jglisse@...hat.com>,
Joerg Roedel <joro@...tes.org>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"linaro-mm-sig@...ts.linaro.org" <linaro-mm-sig@...ts.linaro.org>,
"linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-media@...r.kernel.org" <linux-media@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
Logan Gunthorpe <logang@...tatee.com>,
Marek Szyprowski <m.szyprowski@...sung.com>,
Robin Murphy <robin.murphy@....com>,
Sumit Semwal <sumit.semwal@...aro.org>,
Will Deacon <will@...nel.org>
Subject: Re: [PATCH 10/10] vfio/pci: Add dma-buf export support for MMIO
regions
On Fri, Jul 25, 2025 at 05:34:40AM +0000, Kasireddy, Vivek wrote:
> Hi Leon,
>
> > Subject: Re: [PATCH 10/10] vfio/pci: Add dma-buf export support for MMIO
> > regions
> >
> > > >
> > > > From: Leon Romanovsky <leonro@...dia.com>
> > > >
> > > > Add support for exporting PCI device MMIO regions through dma-buf,
> > > > enabling safe sharing of non-struct page memory with controlled
> > > > lifetime management. This allows RDMA and other subsystems to
> > import
> > > > dma-buf FDs and build them into memory regions for PCI P2P
> > operations.
> > > >
> > > > The implementation provides a revocable attachment mechanism using
> > > > dma-buf move operations. MMIO regions are normally pinned as BARs
> > > > don't change physical addresses, but access is revoked when the VFIO
> > > > device is closed or a PCI reset is issued. This ensures kernel
> > > > self-defense against potentially hostile userspace.
> > > >
> > > > Signed-off-by: Jason Gunthorpe <jgg@...dia.com>
> > > > Signed-off-by: Vivek Kasireddy <vivek.kasireddy@...el.com>
> > > > Signed-off-by: Leon Romanovsky <leonro@...dia.com>
> > > > ---
> > > > drivers/vfio/pci/Kconfig | 20 ++
> > > > drivers/vfio/pci/Makefile | 2 +
> > > > drivers/vfio/pci/vfio_pci_config.c | 22 +-
> > > > drivers/vfio/pci/vfio_pci_core.c | 25 ++-
> > > > drivers/vfio/pci/vfio_pci_dmabuf.c | 321
> > +++++++++++++++++++++++++++++
> > > > drivers/vfio/pci/vfio_pci_priv.h | 23 +++
> > > > include/linux/dma-buf.h | 1 +
> > > > include/linux/vfio_pci_core.h | 3 +
> > > > include/uapi/linux/vfio.h | 19 ++
> > > > 9 files changed, 431 insertions(+), 5 deletions(-)
> > > > create mode 100644 drivers/vfio/pci/vfio_pci_dmabuf.c
> >
> > <...>
> >
> > > > +static int validate_dmabuf_input(struct vfio_pci_core_device *vdev,
> > > > + struct vfio_device_feature_dma_buf
> > *dma_buf)
> > > > +{
> > > > + struct pci_dev *pdev = vdev->pdev;
> > > > + u32 bar = dma_buf->region_index;
> > > > + u64 offset = dma_buf->offset;
> > > > + u64 len = dma_buf->length;
> > > > + resource_size_t bar_size;
> > > > + u64 sum;
> > > > +
> > > > + /*
> > > > + * For PCI the region_index is the BAR number like everything else.
> > > > + */
> > > > + if (bar >= VFIO_PCI_ROM_REGION_INDEX)
> > > > + return -ENODEV;
> >
> > <...>
> >
> > > > +/**
> > > > + * Upon VFIO_DEVICE_FEATURE_GET create a dma_buf fd for the
> > > > + * regions selected.
> > > > + *
> > > > + * open_flags are the typical flags passed to open(2), eg O_RDWR,
> > > > O_CLOEXEC,
> > > > + * etc. offset/length specify a slice of the region to create the dmabuf
> > from.
> > > > + * nr_ranges is the total number of (P2P DMA) ranges that comprise the
> > > > dmabuf.
> > > Any particular reason why you dropped the option (nr_ranges) of creating
> > a
> > > single dmabuf from multiple ranges of an MMIO region?
> >
> > I did it for two reasons. First, I wanted to simplify the code in order
> > to speed-up discussion over the patchset itself. Second, I failed to
> > find justification for need of multiple ranges, as the number of BARs
> > are limited by VFIO_PCI_ROM_REGION_INDEX (6) and same functionality
> > can be achieved by multiple calls to DMABUF import.
> I don't think the same functionality can be achieved by multiple calls to
> dmabuf import. AFAIU, a dmabuf (as of today) is backed by a SGL that can
> have multiple entries because it represents a scattered buffer (multiple
> non-contiguous entries in System RAM or an MMIO region).
I don't know all the reasons why SG was chosen, but one of the main
reasons is that DMA SG API was the only one possible way to handle p2p
transfers (peer2peer flag).
> But in this patch you are constraining it such that only one entry associated with a
> buffer would be included, which effectively means that we cannot create
> a dmabuf to represent scattered buffers (located in a single MMIO region
> such as VRAM or other device memory) anymore.
Yes
>
> >
> > >
> > > Restricting the dmabuf to a single range (or having to create multiple
> > dmabufs
> > > to represent multiple regions/ranges associated with a single scattered
> > buffer)
> > > would be very limiting and may not work in all cases. For instance, in my
> > use-case,
> > > I am trying to share a large (4k mode) framebuffer (FB) located in GPU's
> > VRAM
> > > between two (p2p compatible) GPU devices. And, this would probably not
> > work
> > > given that allocating a large contiguous FB (nr_ranges = 1) in VRAM may
> > not be
> > > feasible when there is memory pressure.
> >
> > Can you please help me and point to the place in the code where this can
> > fail?
> > I'm probably missing something basic as there are no large allocations
> > in the current patchset.
> Sorry, I was not very clear. What I meant is that it is not prudent to assume that
> there will only be one range associated with an MMIO region which we need to
> consider while creating a dmabuf. And, I was pointing out my use-case as an
> example where vfio-pci needs to create a dmabuf for a large buffer (FB) that
> would likely be scattered (and not contiguous) in an MMIO region (such as VRAM).
>
> Let me further explain with my use-case. Here is a link to my Qemu-based test:
> https://gitlab.freedesktop.org/Vivek/qemu/-/commit/b2bdb16d9cfaf55384c95b1f060f175ad1c89e95#81dc845f0babf39649c4e086e173375614111b4a_29_46
Ohh, thanks. I'll add nr_ranges in next version. I see that you are
using same region_index for all ranges and this is how I would like to
keep it: "multiple nr_ranges, same region_index".
Thanks
Powered by blists - more mailing lists