[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d6fbee39-a38f-4f94-bffb-938f7be73681@redhat.com>
Date: Fri, 13 Jun 2025 20:09:41 +0200
From: David Hildenbrand <david@...hat.com>
To: Peter Xu <peterx@...hat.com>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, kvm@...r.kernel.org
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Alex Williamson <alex.williamson@...hat.com>, Zi Yan <ziy@...dia.com>,
Jason Gunthorpe <jgg@...dia.com>, Alex Mastro <amastro@...com>,
Nico Pache <npache@...hat.com>
Subject: Re: [PATCH 5/5] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED
mappings
On 13.06.25 15:41, Peter Xu wrote:
> This patch enables best-effort mmap() for vfio-pci bars even without
> MAP_FIXED, so as to utilize huge pfnmaps as much as possible. It should
> also avoid userspace changes (switching to MAP_FIXED with pre-aligned VA
> addresses) to start enabling huge pfnmaps on VFIO bars.
>
> Here the trick is making sure the MMIO PFNs will be aligned with the VAs
> allocated from mmap() when !MAP_FIXED, so that whatever returned from
> mmap(!MAP_FIXED) of vfio-pci MMIO regions will be automatically suitable
> for huge pfnmaps as much as possible.
>
> To achieve that, a custom vfio_device's get_unmapped_area() for vfio-pci
> devices is needed.
>
> Note that MMIO physical addresses should normally be guaranteed to be
> always bar-size aligned, hence the bar offset can logically be directly
> used to do the calculation. However to make it strict and clear (rather
> than relying on spec details), we still try to fetch the bar's physical
> addresses from pci_dev.resource[].
>
> Signed-off-by: Alex Williamson <alex.williamson@...hat.com>
There is likely a
Co-developed-by: Alex Williamson <alex.williamson@...hat.com>
missing?
> Signed-off-by: Peter Xu <peterx@...hat.com>
> ---
> drivers/vfio/pci/vfio_pci.c | 3 ++
> drivers/vfio/pci/vfio_pci_core.c | 65 ++++++++++++++++++++++++++++++++
> include/linux/vfio_pci_core.h | 6 +++
> 3 files changed, 74 insertions(+)
>
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 5ba39f7623bb..d9ae6cdbea28 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -144,6 +144,9 @@ static const struct vfio_device_ops vfio_pci_ops = {
> .detach_ioas = vfio_iommufd_physical_detach_ioas,
> .pasid_attach_ioas = vfio_iommufd_physical_pasid_attach_ioas,
> .pasid_detach_ioas = vfio_iommufd_physical_pasid_detach_ioas,
> +#ifdef CONFIG_ARCH_SUPPORTS_HUGE_PFNMAP
> + .get_unmapped_area = vfio_pci_core_get_unmapped_area,
> +#endif
> };
>
> static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 6328c3a05bcd..835bc168f8b7 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -1641,6 +1641,71 @@ static unsigned long vma_to_pfn(struct vm_area_struct *vma)
> return (pci_resource_start(vdev->pdev, index) >> PAGE_SHIFT) + pgoff;
> }
>
> +#ifdef CONFIG_ARCH_SUPPORTS_HUGE_PFNMAP
> +/*
> + * Hint function to provide mmap() virtual address candidate so as to be
> + * able to map huge pfnmaps as much as possible. It is done by aligning
> + * the VA to the PFN to be mapped in the specific bar.
> + *
> + * Note that this function does the minimum check on mmap() parameters to
> + * make the PFN calculation valid only. The majority of mmap() sanity check
> + * will be done later in mmap().
> + */
> +unsigned long vfio_pci_core_get_unmapped_area(struct vfio_device *device,
> + struct file *file,
> + unsigned long addr,
> + unsigned long len,
> + unsigned long pgoff,
> + unsigned long flags)
A very suboptimal way to indent this many parameters; just use two tabs
at the beginning.
> +{
> + struct vfio_pci_core_device *vdev =
> + container_of(device, struct vfio_pci_core_device, vdev);
> + struct pci_dev *pdev = vdev->pdev;
> + unsigned long ret, phys_len, req_start, phys_addr;
> + unsigned int index;
> +
> + index = pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
Could do
unsigned int index = pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
at the very top.
> +
> + /* Currently, only bars 0-5 supports huge pfnmap */
> + if (index >= VFIO_PCI_ROM_REGION_INDEX)
> + goto fallback;
> +
> + /* Bar offset */
> + req_start = (pgoff << PAGE_SHIFT) & ((1UL << VFIO_PCI_OFFSET_SHIFT) - 1);
> + phys_len = PAGE_ALIGN(pci_resource_len(pdev, index));
> +
> + /*
> + * Make sure we at least can get a valid physical address to do the
> + * math. If this happens, it will probably fail mmap() later..
> + */
> + if (req_start >= phys_len)
> + goto fallback;
> +
> + phys_len = MIN(phys_len, len);
> + /* Calculate the start of physical address to be mapped */
> + phys_addr = pci_resource_start(pdev, index) + req_start;
> +
> + /* Choose the alignment */
> + if (IS_ENABLED(CONFIG_ARCH_SUPPORTS_PUD_PFNMAP) && phys_len >= PUD_SIZE) {
> + ret = mm_get_unmapped_area_aligned(file, addr, len, phys_addr,
> + flags, PUD_SIZE, 0);
> + if (ret)
> + return ret;
> + }
> +
> + if (phys_len >= PMD_SIZE) {
> + ret = mm_get_unmapped_area_aligned(file, addr, len, phys_addr,
> + flags, PMD_SIZE, 0);
> + if (ret)
> + return ret;
Similar to Jason, I wonder if that logic should reside in the core, and
we only indicate the maximum page table level we support.
unsigned int order)
> {
> diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
> index fbb472dd99b3..e59699e01901 100644
> --- a/include/linux/vfio_pci_core.h
> +++ b/include/linux/vfio_pci_core.h
> @@ -119,6 +119,12 @@ ssize_t vfio_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
> size_t count, loff_t *ppos);
> ssize_t vfio_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
> size_t count, loff_t *ppos);
> +unsigned long vfio_pci_core_get_unmapped_area(struct vfio_device *device,
> + struct file *file,
> + unsigned long addr,
> + unsigned long len,
> + unsigned long pgoff,
> + unsigned long flags);
Dito.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists