[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e2033095-9bf1-4d9c-9a5b-01148eaffc30@redhat.com>
Date: Thu, 4 Dec 2025 19:16:54 +0100
From: Cédric Le Goater <clg@...hat.com>
To: Peter Xu <peterx@...hat.com>, kvm@...r.kernel.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Cc: Jason Gunthorpe <jgg@...dia.com>, Nico Pache <npache@...hat.com>,
Zi Yan <ziy@...dia.com>, Alex Mastro <amastro@...com>,
David Hildenbrand <david@...hat.com>, Alex Williamson <alex@...zbot.org>,
Zhi Wang <zhiw@...dia.com>, David Laight <david.laight.linux@...il.com>,
Yi Liu <yi.l.liu@...el.com>, Ankit Agrawal <ankita@...dia.com>,
Kevin Tian <kevin.tian@...el.com>, Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH v2 0/4] mm/vfio: huge pfnmaps with !MAP_FIXED mappings
On 12/4/25 16:09, Peter Xu wrote:
> This series is based on v6.18. It allows mmap(!MAP_FIXED) to work with
> huge pfnmaps with best effort. Meanwhile, it enables it for vfio-pci as
> the first user.
>
> v1: https://lore.kernel.org/r/20250613134111.469884-1-peterx@redhat.com
>
> A changelog may not apply because all the patches were rewrote based on a
> new interface this v2 introduced. Hence omitted.
>
> In this version, a new file operation, get_mapping_order(), is introduced
> (based on discussion with Jason on v1) to minimize the code needed for
> drivers to implement this. It also helps avoid exporting any mm functions.
> One can refer to the discussion in v1 for more information.
>
> Currently, get_mapping_order() API is define as:
>
> int (*get_mapping_order)(struct file *file, unsigned long pgoff, size_t len);
>
> The first argument is the file pointer, the 2nd+3rd are the pgoff+len
> specified from a mmap() request. The driver can use this interface to
> opt-in providing mapping order hints to core mm on VA allocations for the
> range of the file specified. I kept the interface as simple for now, so
> that core mm will always do the alignment with pgoff assuming that would
> always work. The driver can only report the order from pgoff+len, which
> will be used to do the alignment.
>
> Before this series, an userapp in most cases need to be modified to benefit
> from huge mappings to provide huge size aligned VA using MAP_FIXED. After
> this series, the userapp can benefit from huge pfnmap automatically after
> the kernel upgrades, with no userspace modifications.
>
> It's still best-effort, because the auto-alignment will require a larger VA
> range to be allocated via the per-arch allocator, hence if the huge-mapping
> aligned VA cannot be allocated then it'll still fallback to small mappings
> like before. However that's from theory POV: in reality I don't yet know
> when it'll fail especially when on a 64bits system.
>
> So far, only vfio-pci is supported. But the logic should be applicable to
> all the drivers that support or will support huge pfnmaps. I've copied
> some more people in this version too from hardware perspective.
>
> For testings:
>
> - checkpatch.pl
> - cross build harness
> - unit test that I got from Alex [1], checking mmap() alignments on a QEMU
> instance with an 128MB bar.
>
> Checking the alignments look all sane with mmap(!MAP_FIXED), and huge
> mappings properly installed. I didn't observe anything wrong.
>
> I currently lack larger bars to test PUD sizes. Please kindly report if
> one can run this with 1G+ bars and hit issues.
LGTM, with a 32G BAR :
Using device 0000:02:00.0 in IOMMU group 27
Device 0000:02:00.0 supports 9 regions, 5 irqs
[BAR0]: size 0x1000000, order 24, offset 0x0, flags 0xf
Testing BAR0, require at least 21 bit alignment
[PASS] Minimum alignment 21
Testing random offset
[PASS] Random offset
Testing random size
[PASS] Random size
[BAR1]: size 0x800000000, order 35, offset 0x10000000000, flags 0x7
Testing BAR1, require at least 30 bit alignment
[PASS] Minimum alignment 31
Testing random offset
[PASS] Random offset
Testing random size
[PASS] Random size
[BAR3]: size 0x2000000, order 25, offset 0x30000000000, flags 0x7
Testing BAR3, require at least 21 bit alignment
[PASS] Minimum alignment 21
Testing random offset
[PASS] Random offset
Testing random size
[PASS] Random size
C.
>
> Alex Mastro: thanks for the testing offered in v1, but since this series
> was rewritten, a re-test will be needed. I hence didn't collect the T-b.
>
> Comments welcomed, thanks.
>
> [1] https://github.com/awilliam/tests/blob/vfio-pci-device-map-alignment/vfio-pci-device-map-alignment.c
>
> Peter Xu (4):
> mm/thp: Allow thp_get_unmapped_area_vmflags() to take alignment
> mm: Add file_operations.get_mapping_order()
> vfio: Introduce vfio_device_ops.get_mapping_order hook
> vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings
>
> Documentation/filesystems/vfs.rst | 4 +++
> drivers/vfio/pci/vfio_pci.c | 1 +
> drivers/vfio/pci/vfio_pci_core.c | 49 ++++++++++++++++++++++++++
> drivers/vfio/vfio_main.c | 14 ++++++++
> include/linux/fs.h | 1 +
> include/linux/huge_mm.h | 5 +--
> include/linux/vfio.h | 5 +++
> include/linux/vfio_pci_core.h | 2 ++
> mm/huge_memory.c | 7 ++--
> mm/mmap.c | 58 +++++++++++++++++++++++++++----
> 10 files changed, 135 insertions(+), 11 deletions(-)
>
Powered by blists - more mailing lists