[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aTnWphMGVwWl12FX@x1.local>
Date: Wed, 10 Dec 2025 15:23:02 -0500
From: Peter Xu <peterx@...hat.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: kvm@...r.kernel.org, linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Nico Pache <npache@...hat.com>, Zi Yan <ziy@...dia.com>,
Alex Mastro <amastro@...com>, David Hildenbrand <david@...hat.com>,
Alex Williamson <alex@...zbot.org>, Zhi Wang <zhiw@...dia.com>,
David Laight <david.laight.linux@...il.com>,
Yi Liu <yi.l.liu@...el.com>, Ankit Agrawal <ankita@...dia.com>,
Kevin Tian <kevin.tian@...el.com>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH v2 2/4] mm: Add file_operations.get_mapping_order()
On Sun, Dec 07, 2025 at 12:21:32PM -0400, Jason Gunthorpe wrote:
> On Thu, Dec 04, 2025 at 10:10:01AM -0500, Peter Xu wrote:
> > Add one new file operation, get_mapping_order(). It can be used by file
> > backends to report mapping order hints.
> >
> > By default, Linux assumed we will map in PAGE_SIZE chunks. With this hint,
> > the driver can report the possibility of mapping chunks that are larger
> > than PAGE_SIZE. Then, the VA allocator will try to use that as alignment
> > when allocating the VA ranges.
> >
> > This is useful because when chunks to be mapped are larger than PAGE_SIZE,
> > VA alignment matters and it needs to be aligned with the size of the chunk
> > to be mapped.
> >
> > Said that, no matter what is the alignment used for the VA allocation, the
> > driver can still decide which size to map the chunks. It is also not an
> > issue if it keeps mapping in PAGE_SIZE.
> >
> > get_mapping_order() is defined to take three parameters. Besides the 1st
> > parameter which will be the file object pointer, the 2nd + 3rd parameters
> > being the pgoff + size of the mmap() request. Its retval is defined as the
> > order, which must be non-negative to enable the alignment. When zero is
> > returned, it should behave like when the hint is not provided, IOW,
> > alignment will still be PAGE_SIZE.
>
> This should explain how it works when the incoming pgoff is not
> aligned..
Hmm, I thought the charm of this new proposal (based on suggestions of your
v1 reviews) is to not need to worry on this.. Or maybe you meant I should
add some doc comments in the commit message?
If so I can do that.
thp_get_unmapped_area_vmflags() should have taken all kinds of pgoff
unalignment into account. It's just that this v2 is better than v1 when
using this new API because that THP function doesn't need to be exported
anymore.
>
> I think for dpdk we want to support mapping around the MSI hole so
> something like
>
> pgoff 0 -> 2M
> skip 4k
> 2m + 4k -> 64M
>
> Should setup the last VMA to align to 2M + 4k so the first PMD is
> fragmented to 4k pages but the remaning part is 2M sized or better.
>
> We just noticed a bug very similer to this in qemu around it's manual
> alignment scheme where it would de-align things around the MSI window
> and spoil the PMDs.
Right, IIUC this series should work all fine exactly as you said.
Here the driver should only care about what owns the content of (pgoff,
len) range, and the proper order to map these chunks. In case of VFIO, it
will know what BAR it's mapping, so as to return a proper order for that
specific bar pointed by (pgoff, len).
The driver doesn't need to worry on anything else like above.
Let me know if I misread your question, or if this series doesn't achieve
what you're asking here..
Thanks,
>
> I guess ideally the file could return the order assuming an aligned-to-start
> pgoff and the core code could use that order to compute an adjustment
> for
> the actual pgoff so we maintain:
> va % order = pgoff % order
>
> Jason
>
--
Peter Xu
Powered by blists - more mailing lists