[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aFMQZru7l2aKVsZm@x1.local>
Date: Wed, 18 Jun 2025 15:15:50 -0400
From: Peter Xu <peterx@...hat.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: "Liam R. Howlett" <Liam.Howlett@...cle.com>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
kvm@...r.kernel.org, Andrew Morton <akpm@...ux-foundation.org>,
Alex Williamson <alex.williamson@...hat.com>,
Zi Yan <ziy@...dia.com>, Alex Mastro <amastro@...com>,
David Hildenbrand <david@...hat.com>,
Nico Pache <npache@...hat.com>
Subject: Re: [PATCH 5/5] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED
mappings
On Wed, Jun 18, 2025 at 02:46:41PM -0300, Jason Gunthorpe wrote:
> On Wed, Jun 18, 2025 at 12:56:01PM -0400, Peter Xu wrote:
> > So I changed my mind, slightly. I can still have the "order" parameter to
> > make the API cleaner (even if it'll be a pure overhead.. because all
> > existing caller will pass in PUD_SIZE as of now),
>
> That doesn't seem right, the callers should report the real value not
> artifically cap it.. Like ARM does have page sizes greater than PUD
> that might be interesting to enable someday for PFN users.
It needs to pass in PUD_SIZE to match what vfio-pci currently supports in
its huge_fault().
>
> > but I think I'll still
> > stick with the ifdef in patch 4, as I mentioned here:
>
> > https://lore.kernel.org/all/aFGMG3763eSv9l8b@x1.local/
> >
> > The problem is I just noticed yet again that exporting
> > huge_mapping_get_va_aligned() for all configs doesn't make sense. At least
> > it'll need something like this to make !MMU compile for VFIO, while this is
> > definitely some ugliness I also want to avoid..
>
> IMHO this uglyness should certainly be contained to the mm code and not
> leak into drivers.
>
> > There's just no way to provide a sane default value for !MMU.
>
> So all this mess seems to say that get_unmapped_area() is just the
> wrong fop to have here. It can't be implemented sanely for !MMU and
> has these weird conditions, like can't fail.
>
> I again suggest to just simplify and add an new fop
>
> size_t get_best_mapping_order(struct file *filp, pgoff_t pgoff,
> size_t length);
>
> Which will return the largest pgoff aligned order within pgoff/length
> that the FD could try to install. Very simple for the driver
> side. vfio pci will just return ilog2(bar_size).
>
> PAGE_SHIFT can be a safe default.
I agree this is a better way. We can make the PAGE_SHIFT by default or
just 0, because it doesn't sound necessary to me to support anything
smaller than PAGE_SIZE.. maybe a "int" retval would suffice to also cover
errors.
So this will introduce a new file operation that will only be used so far
in VFIO, playing similar role until we start to convert many
get_unmapped_area() to this one.
>
> Then put all this maze of conditionals in the mm side replacing the
> call to fops->get_unmapped_area() and don't export anything new. The
> mm will automaticall cap the alignment based on what the architecture
> can do and what
>
> !MMU would simply entirely ignore this new stuff.
For the long term, we should move all get_unmapped_area() users to the new
API. For old !MMU users, we should rename get_unmapped_area() to something
better, like get_mmap_addr(). For those cases it's really not about
looking for something not mapped, but normally exactly what is requested.
>
> > So going one step back: huge_mapping_get_va_aligned() (or whatever name we
> > prefer) doesn't make sense to be exported always, but only when CONFIG_MMU.
> > It should follow the same way we treat mm_get_unmapped_area().
>
> We just deleted !SMP, I really wonder if it is time for !MMU to go
> away too..
Yes, if this comes earlier, we can completely drop get_unmapped_area()
after all existing MMU users converted to the new one.
Any early objections / concerns / comments from anyone else, before I go
and introduce it?
--
Peter Xu
Powered by blists - more mailing lists