[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251204151003.171039-1-peterx@redhat.com>
Date: Thu, 4 Dec 2025 10:09:59 -0500
From: Peter Xu <peterx@...hat.com>
To: kvm@...r.kernel.org,
linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Cc: Jason Gunthorpe <jgg@...dia.com>,
Nico Pache <npache@...hat.com>,
Zi Yan <ziy@...dia.com>,
Alex Mastro <amastro@...com>,
David Hildenbrand <david@...hat.com>,
Alex Williamson <alex@...zbot.org>,
Zhi Wang <zhiw@...dia.com>,
David Laight <david.laight.linux@...il.com>,
Yi Liu <yi.l.liu@...el.com>,
Ankit Agrawal <ankita@...dia.com>,
peterx@...hat.com,
Kevin Tian <kevin.tian@...el.com>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: [PATCH v2 0/4] mm/vfio: huge pfnmaps with !MAP_FIXED mappings
This series is based on v6.18. It allows mmap(!MAP_FIXED) to work with
huge pfnmaps with best effort. Meanwhile, it enables it for vfio-pci as
the first user.
v1: https://lore.kernel.org/r/20250613134111.469884-1-peterx@redhat.com
A changelog may not apply because all the patches were rewrote based on a
new interface this v2 introduced. Hence omitted.
In this version, a new file operation, get_mapping_order(), is introduced
(based on discussion with Jason on v1) to minimize the code needed for
drivers to implement this. It also helps avoid exporting any mm functions.
One can refer to the discussion in v1 for more information.
Currently, get_mapping_order() API is define as:
int (*get_mapping_order)(struct file *file, unsigned long pgoff, size_t len);
The first argument is the file pointer, the 2nd+3rd are the pgoff+len
specified from a mmap() request. The driver can use this interface to
opt-in providing mapping order hints to core mm on VA allocations for the
range of the file specified. I kept the interface as simple for now, so
that core mm will always do the alignment with pgoff assuming that would
always work. The driver can only report the order from pgoff+len, which
will be used to do the alignment.
Before this series, an userapp in most cases need to be modified to benefit
from huge mappings to provide huge size aligned VA using MAP_FIXED. After
this series, the userapp can benefit from huge pfnmap automatically after
the kernel upgrades, with no userspace modifications.
It's still best-effort, because the auto-alignment will require a larger VA
range to be allocated via the per-arch allocator, hence if the huge-mapping
aligned VA cannot be allocated then it'll still fallback to small mappings
like before. However that's from theory POV: in reality I don't yet know
when it'll fail especially when on a 64bits system.
So far, only vfio-pci is supported. But the logic should be applicable to
all the drivers that support or will support huge pfnmaps. I've copied
some more people in this version too from hardware perspective.
For testings:
- checkpatch.pl
- cross build harness
- unit test that I got from Alex [1], checking mmap() alignments on a QEMU
instance with an 128MB bar.
Checking the alignments look all sane with mmap(!MAP_FIXED), and huge
mappings properly installed. I didn't observe anything wrong.
I currently lack larger bars to test PUD sizes. Please kindly report if
one can run this with 1G+ bars and hit issues.
Alex Mastro: thanks for the testing offered in v1, but since this series
was rewritten, a re-test will be needed. I hence didn't collect the T-b.
Comments welcomed, thanks.
[1] https://github.com/awilliam/tests/blob/vfio-pci-device-map-alignment/vfio-pci-device-map-alignment.c
Peter Xu (4):
mm/thp: Allow thp_get_unmapped_area_vmflags() to take alignment
mm: Add file_operations.get_mapping_order()
vfio: Introduce vfio_device_ops.get_mapping_order hook
vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings
Documentation/filesystems/vfs.rst | 4 +++
drivers/vfio/pci/vfio_pci.c | 1 +
drivers/vfio/pci/vfio_pci_core.c | 49 ++++++++++++++++++++++++++
drivers/vfio/vfio_main.c | 14 ++++++++
include/linux/fs.h | 1 +
include/linux/huge_mm.h | 5 +--
include/linux/vfio.h | 5 +++
include/linux/vfio_pci_core.h | 2 ++
mm/huge_memory.c | 7 ++--
mm/mmap.c | 58 +++++++++++++++++++++++++++----
10 files changed, 135 insertions(+), 11 deletions(-)
--
2.50.1
Powered by blists - more mailing lists