linux-kernel - Re: [PATCH 5/5] vfio-pci: Best-effort huge pfnmaps with !MAP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aFMQZru7l2aKVsZm@x1.local>
Date: Wed, 18 Jun 2025 15:15:50 -0400
From: Peter Xu <peterx@...hat.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: "Liam R. Howlett" <Liam.Howlett@...cle.com>,
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	kvm@...r.kernel.org, Andrew Morton <akpm@...ux-foundation.org>,
	Alex Williamson <alex.williamson@...hat.com>,
	Zi Yan <ziy@...dia.com>, Alex Mastro <amastro@...com>,
	David Hildenbrand <david@...hat.com>,
	Nico Pache <npache@...hat.com>
Subject: Re: [PATCH 5/5] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED
 mappings

On Wed, Jun 18, 2025 at 02:46:41PM -0300, Jason Gunthorpe wrote:
> On Wed, Jun 18, 2025 at 12:56:01PM -0400, Peter Xu wrote:
> > So I changed my mind, slightly.  I can still have the "order" parameter to
> > make the API cleaner (even if it'll be a pure overhead.. because all
> > existing caller will pass in PUD_SIZE as of now), 
> 
> That doesn't seem right, the callers should report the real value not
> artifically cap it.. Like ARM does have page sizes greater than PUD
> that might be interesting to enable someday for PFN users.

It needs to pass in PUD_SIZE to match what vfio-pci currently supports in
its huge_fault().

> 
> > but I think I'll still
> > stick with the ifdef in patch 4, as I mentioned here:
> 
> > https://lore.kernel.org/all/aFGMG3763eSv9l8b@x1.local/
> > 
> > The problem is I just noticed yet again that exporting
> > huge_mapping_get_va_aligned() for all configs doesn't make sense.  At least
> > it'll need something like this to make !MMU compile for VFIO, while this is
> > definitely some ugliness I also want to avoid..
> 
> IMHO this uglyness should certainly be contained to the mm code and not
> leak into drivers.
> 
> > There's just no way to provide a sane default value for !MMU.
> 
> So all this mess seems to say that get_unmapped_area() is just the
> wrong fop to have here. It can't be implemented sanely for !MMU and
> has these weird conditions, like can't fail.
> 
> I again suggest to just simplify and add an new fop 
> 
> size_t get_best_mapping_order(struct file *filp, pgoff_t pgoff,
>                               size_t length);
> 
> Which will return the largest pgoff aligned order within pgoff/length
> that the FD could try to install. Very simple for the driver
> side. vfio pci will just return ilog2(bar_size).
> 
> PAGE_SHIFT can be a safe default.

I agree this is a better way.  We can make the PAGE_SHIFT by default or
just 0, because it doesn't sound necessary to me to support anything
smaller than PAGE_SIZE.. maybe a "int" retval would suffice to also cover
errors.

So this will introduce a new file operation that will only be used so far
in VFIO, playing similar role until we start to convert many
get_unmapped_area() to this one.

> 
> Then put all this maze of conditionals in the mm side replacing the
> call to fops->get_unmapped_area() and don't export anything new. The
> mm will automaticall cap the alignment based on what the architecture
> can do and what 
> 
> !MMU would simply entirely ignore this new stuff.

For the long term, we should move all get_unmapped_area() users to the new
API.  For old !MMU users, we should rename get_unmapped_area() to something
better, like get_mmap_addr().  For those cases it's really not about
looking for something not mapped, but normally exactly what is requested.

> 
> > So going one step back: huge_mapping_get_va_aligned() (or whatever name we
> > prefer) doesn't make sense to be exported always, but only when CONFIG_MMU.
> > It should follow the same way we treat mm_get_unmapped_area().
> 
> We just deleted !SMP, I really wonder if it is time for !MMU to go
> away too..

Yes, if this comes earlier, we can completely drop get_unmapped_area()
after all existing MMU users converted to the new one.

Any early objections / concerns / comments from anyone else, before I go
and introduce it?

-- 
Peter Xu