[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aFQkxg08fs7jwXnJ@x1.local>
Date: Thu, 19 Jun 2025 10:55:02 -0400
From: Peter Xu <peterx@...hat.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: "Liam R. Howlett" <Liam.Howlett@...cle.com>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
kvm@...r.kernel.org, Andrew Morton <akpm@...ux-foundation.org>,
Alex Williamson <alex.williamson@...hat.com>,
Zi Yan <ziy@...dia.com>, Alex Mastro <amastro@...com>,
David Hildenbrand <david@...hat.com>,
Nico Pache <npache@...hat.com>
Subject: Re: [PATCH 5/5] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED
mappings
On Thu, Jun 19, 2025 at 10:58:52AM -0300, Jason Gunthorpe wrote:
> On Wed, Jun 18, 2025 at 03:15:50PM -0400, Peter Xu wrote:
> > > > So I changed my mind, slightly. I can still have the "order" parameter to
> > > > make the API cleaner (even if it'll be a pure overhead.. because all
> > > > existing caller will pass in PUD_SIZE as of now),
> > >
> > > That doesn't seem right, the callers should report the real value not
> > > artifically cap it.. Like ARM does have page sizes greater than PUD
> > > that might be interesting to enable someday for PFN users.
> >
> > It needs to pass in PUD_SIZE to match what vfio-pci currently supports in
> > its huge_fault().
>
> Hm, OK that does make sense. I would add a small comment though as it
> is not so intuitive and may not apply to something using ioremap..
Sure, I'll remember to add some comment if I'll go back to the old
interface. I hope it won't happen..
>
> > So this will introduce a new file operation that will only be used so far
> > in VFIO, playing similar role until we start to convert many
> > get_unmapped_area() to this one.
>
> Yes, if someone wants to do a project here you can markup
> memfds/shmem/hugetlbfs/etc/etc to define their internal folio orders
> and hopefully ultimately remove some of that alignment logic from the
> arch code.
I'm a bit refrained to touch all of the files just for this, but I can
definitely add very verbose explanation into the commit log when I'll
introduce the new API, on not only the relationship of that and the old
APIs, also possible future works.
Besides the get_unmapped_area() -> NEW API conversions which is arch
independent in most cases, indeed if it would be great to reduce per-arch
alignment requirement as much as possible. At least that should apply for
hugetlbfs that it shouldn't be arch-dependent. I am not sure about the
rest, though. For example, I see archs may treat PF_RANDOMIZE differently.
There might be a lot of trivial details to look at.
OTOH, one other thought (which may not need to monitor all archs) is it
does look confusing to have two layers of alignment operation, which is at
least the case of THP right now. So it might be good to at least punch it
through to use vm_unmapped_area_info.align_mask / etc. if possible, to
avoid double-padding: after all, unmapped_area() also did align paddings.
It smells like something we overlooked when initially support THP.
Thanks,
--
Peter Xu
Powered by blists - more mailing lists