[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aFxNdDpIlx0fZoIN@x1.local>
Date: Wed, 25 Jun 2025 15:26:44 -0400
From: Peter Xu <peterx@...hat.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: "Liam R. Howlett" <Liam.Howlett@...cle.com>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
kvm@...r.kernel.org, Andrew Morton <akpm@...ux-foundation.org>,
Alex Williamson <alex.williamson@...hat.com>,
Zi Yan <ziy@...dia.com>, Alex Mastro <amastro@...com>,
David Hildenbrand <david@...hat.com>,
Nico Pache <npache@...hat.com>
Subject: Re: [PATCH 5/5] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED
mappings
On Wed, Jun 25, 2025 at 03:41:54PM -0300, Jason Gunthorpe wrote:
> On Wed, Jun 25, 2025 at 01:12:11PM -0400, Peter Xu wrote:
>
> > After I read the two use cases, I mostly agree. Just one trivial thing to
> > mention, it may not be direct map but vmap() (see io_region_init_ptr()).
>
> If it is vmapped then this is all silly, you should vmap and mmmap
> using the same cache colouring and, AFAIK, pgoff is how this works for
> purely userspace.
>
> Once vmap'd it should determine the cache colour and set the pgoff
> properly, then everything should already work no?
I don't yet see how to set the pgoff. Here pgoff is passed from the
userspace, which follows io_uring's definition (per io_uring_mmap).
For example, in parisc one could map the complete queue with
pgoff=IORING_OFF_CQ_RING (0x8000000), but then the VA alignment needs to be
adjusted to the vmap() returned for complete queue's io_mapped_region.ptr.
>
> > It already does, see (io_uring_get_unmapped_area(), of parisc):
> >
> > /*
> > * Do not allow to map to user-provided address to avoid breaking the
> > * aliasing rules. Userspace is not able to guess the offset address of
> > * kernel kmalloc()ed memory area.
> > */
> > if (addr)
> > return -EINVAL;
> >
> > I do not know whoever would use MAP_FIXED but with addr=0. So failing
> > addr!=0 should literally stop almost all MAP_FIXED already.
>
> Maybe but also it is not right to not check MAP_FIXED directly.. And
> addr is supposed to be a hint for non-fixed mode so it is weird to
> -EINVAL when you can ignore the hint??
I agree on both points here.
>
> > Going back to the topic of this series - I think the new API would work for
> > io_uring and parisc too if I can return phys_pgoff, here what parisc would
> > need is:
>
> The best solution is to fix the selection of normal pgoff so it has
> consistent colouring of user VMAs and kernel vmaps. Either compute a
> pgoff that matches the vmap (hopefully easy if it is not uABI) or
> teach the kernel vmap how to respect a "pgoff" to set the cache
> colouring just like the user VMA's do (AFIACR).
>
> But I think this is getting maybe too big and I'd just introduce the
> new API and not try to convert this hard stuff. The above explanation
> how it could be fixed should be enough??
I never planned to do it myself. However if I'm going to sign-off and
propose an API, I want to be crystal clear of the goal of the API, and
feasibility of the goal even if I'm not going to work on it..
We don't want to introduce something then found it won't work even for some
MMU use cases, and start maintaining both, or revert back. I wished we
could have sticked with the get_unmapped_area() as of now and leave the API
for later.
So if we want the new API to be proposed here, and make VFIO use it first
(while consider it to be applicable to all existing MMU users at least,
which I checked all of them so far now), I'd think this proper:
int (*mmap_va_hint)(struct file *file, unsigned long *pgoff, size_t len);
The changes comparing to previous:
(1) merged pgoff and *phys_pgoff parameters into one unsigned long, so
the hook can adjust the pgoff for the va allocator to be used. The
adjustment will not be visible to future mmap() when VMA is created.
(2) I renamed it to mmap_va_hint(), because *pgoff will be able to be
updated, so it's not only about ordering, but "order" and "pgoff
adjustment" hints that the core mm will use when calculating the VA.
Does it look ok to you?
--
Peter Xu
Powered by blists - more mailing lists