[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250630140537.GW167785@nvidia.com>
Date: Mon, 30 Jun 2025 11:05:37 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Peter Xu <peterx@...hat.com>
Cc: "Liam R. Howlett" <Liam.Howlett@...cle.com>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
kvm@...r.kernel.org, Andrew Morton <akpm@...ux-foundation.org>,
Alex Williamson <alex.williamson@...hat.com>,
Zi Yan <ziy@...dia.com>, Alex Mastro <amastro@...com>,
David Hildenbrand <david@...hat.com>,
Nico Pache <npache@...hat.com>
Subject: Re: [PATCH 5/5] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED
mappings
On Wed, Jun 25, 2025 at 03:26:44PM -0400, Peter Xu wrote:
> On Wed, Jun 25, 2025 at 03:41:54PM -0300, Jason Gunthorpe wrote:
> > On Wed, Jun 25, 2025 at 01:12:11PM -0400, Peter Xu wrote:
> >
> > > After I read the two use cases, I mostly agree. Just one trivial thing to
> > > mention, it may not be direct map but vmap() (see io_region_init_ptr()).
> >
> > If it is vmapped then this is all silly, you should vmap and mmmap
> > using the same cache colouring and, AFAIK, pgoff is how this works for
> > purely userspace.
> >
> > Once vmap'd it should determine the cache colour and set the pgoff
> > properly, then everything should already work no?
>
> I don't yet see how to set the pgoff. Here pgoff is passed from the
> userspace, which follows io_uring's definition (per io_uring_mmap).
That's too bad
So you have to do it the other way and pass the pgoff to the vmap so
the vmap ends up with the same colouring as a user VMa holding the
same pages..
> So if we want the new API to be proposed here, and make VFIO use it first
> (while consider it to be applicable to all existing MMU users at least,
> which I checked all of them so far now), I'd think this proper:
>
> int (*mmap_va_hint)(struct file *file, unsigned long *pgoff, size_t len);
>
> The changes comparing to previous:
>
> (1) merged pgoff and *phys_pgoff parameters into one unsigned long, so
> the hook can adjust the pgoff for the va allocator to be used. The
> adjustment will not be visible to future mmap() when VMA is created.
It seems functional, but the above is better, IMHO.
> (2) I renamed it to mmap_va_hint(), because *pgoff will be able to be
> updated, so it's not only about ordering, but "order" and "pgoff
> adjustment" hints that the core mm will use when calculating the VA.
Where does order come back though? Returns order?
It seems viable
Jason
Powered by blists - more mailing lists