lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250625130711.GH167785@nvidia.com>
Date: Wed, 25 Jun 2025 10:07:11 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Peter Xu <peterx@...hat.com>
Cc: "Liam R. Howlett" <Liam.Howlett@...cle.com>,
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	kvm@...r.kernel.org, Andrew Morton <akpm@...ux-foundation.org>,
	Alex Williamson <alex.williamson@...hat.com>,
	Zi Yan <ziy@...dia.com>, Alex Mastro <amastro@...com>,
	David Hildenbrand <david@...hat.com>,
	Nico Pache <npache@...hat.com>
Subject: Re: [PATCH 5/5] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED
 mappings

On Tue, Jun 24, 2025 at 08:48:45PM -0400, Peter Xu wrote:
> > My feeling, and the reason I used the phrase "pgoff aligned address",
> > is that the owner of the file should already ensure that for the large
> > PTEs/folios:
> >  pgoff % 2**order == 0
> >  physical % 2**order == 0
> 
> IMHO there shouldn't really be any hard requirement in mm that pgoff and
> physical address space need to be aligned.. but I confess I don't have an
> example driver that didn't do that in the linux tree.

Well, maybe, but right now there does seem to be for
THP/hugetlbfs/etc. It is a nice simple solution that exposes the
alignment requirements to userspace if it wants to use MAP_FIXED.

> > To me this just keeps thing simpler. I guess if someone comes up with
> > a case where they really can't get a pgoff alignment and really need a
> > high order mapping then maybe we can add a new return field of some
> > kind (pgoff adjustment?) but that is so weird I'd leave it to the
> > future person to come and justfiy it.
> 
> When looking more, I also found some special cased get_unmapped_area() that
> may not be trivially converted into the new API even for CONFIG_MMU, namely:
> 
> - io_uring_get_unmapped_area
> - arena_get_unmapped_area (from bpf_map->ops->map_get_unmapped_area)
> 
> I'll need to have some closer look tomorrow.  If any of them cannot be 100%
> safely converted to the new API, I'd also think we should not introduce the
> new API, but reuse get_unmapped_area() until we know a way out.

Oh yuk. It is trying to avoid the dcache flush on some kernel paths
for virtually tagged cache systems.

Arguably this fixup should not be in io_uring, but conveying the right
information to the core code, and requesting a special flush
avoidance mapping is not so easy.

But again I suspect the pgoff is the right solution.

IIRC this is handled by forcing a few low virtual address bits to
always match across all user mappings (the colour) via the pgoff. This
way the userspace always uses the same cache tag and doesn't become
cache incoherent. ie:

   user_addr % PAGE_SIZE*N == pgoff % PAGE_SIZE*N

The issue is now the kernel is using the direct map and we can't force
a random jumble of pages to have the right colours to match
userspace. So the kernel has all those dcache flushes sprinkled about
before it touches user mapped memory through the direct map as the
kernel will use a different colour and cache tag.

So.. if iouring selects a pgoff that automatically gives the right
colour for the userspace mapping to also match the kernel mapping's
colour then things should just work.

Frankly I'm shocked that someone invested time in trying to make this
work - the commit log says it was for parisc and only 2 years ago :(

d808459b2e31 ("io_uring: Adjust mapping wrt architecture aliasing requirements")

I thought such physically tagged cache systems were long ago dead and
buried..

Shouldn't this entirely reject MAP_FIXED too?

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ