[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250617115948.GV1174925@nvidia.com>
Date: Tue, 17 Jun 2025 08:59:48 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Nicolin Chen <nicolinc@...dia.com>
Cc: kevin.tian@...el.com, shuah@...nel.org, joao.m.martins@...cle.com,
steven.sistare@...cle.com, iommu@...ts.linux.dev,
linux-kselftest@...r.kernel.org, linux-kernel@...r.kernel.org,
thomas.weissschuh@...utronix.de
Subject: Re: [PATCH rc 1/4] iommufd/selftest: Fix iommufd_dirty_tracking with
large hugepage sizes
On Mon, Jun 16, 2025 at 07:02:08PM -0700, Nicolin Chen wrote:
> On Mon, Jun 16, 2025 at 01:25:01PM -0300, Jason Gunthorpe wrote:
> > On Sun, Jun 15, 2025 at 10:02:03PM -0700, Nicolin Chen wrote:
> > > FIXTURE_TEARDOWN(iommufd_dirty_tracking)
> > > {
> > > - munmap(self->buffer, variant->buffer_size);
> > > - munmap(self->bitmap, DIV_ROUND_UP(self->bitmap_size, BITS_PER_BYTE));
> > > + unsigned long size = variant->buffer_size;
> > > +
> > > + if (variant->hugepages)
> > > + size = __ALIGN_KERNEL(variant->buffer_size, HUGEPAGE_SIZE);
> > > + munmap(self->buffer, size);
> > > + free(self->buffer);
> > > + free(self->bitmap);
> > > teardown_iommufd(self->fd, _metadata);
> >
> > munmap followed by free isn't right..
>
> You are right. I re-checked with Copilot. It says the same thing.
> I think the whole posix_memalign() + mmap() confuses me..
>
> Yet, should the bitmap pair with free() since it's allocated by a
> posix_memalign() call?
>
> > This code is using the glibc allocator to get a bunch of pages mmap'd
> > to an aligned location then replacing the pages with MAP_SHARED and
> > maybe HAP_HUGETLB versions.
>
> And I studied some use cases from Copilot. It says that, to use
> the combination of posix_memalign+mmap, we should do:
> aligned_ptr = posix_memalign(pagesize, pagesize);
> unmap(aligned_ptr, pagesize);
> mapped = mmap(aligned_ptr, pagesize, PROT_READ | PROT_WRITE,
> MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
> munmap(mapped, pagesize);
> // No free() after munmap().
> ---breakdown---
> Before `posix_memalign()`:
> [ heap memory unused ]
>
> After `posix_memalign()`:
> [ posix_memalign() memory ] ← managed by malloc/free
> ↑ aligned_ptr
>
> After `munmap(aligned_ptr)`:
> [ unmapped memory ] ← allocator no longer owns it
Incorrect. The allocator has no idea about the munmap and munmap
doesn't disturb any of the allocator tracking structures.
> After `mmap(aligned_ptr, ..., MAP_FIXED)`:
> [ anonymous mmap region ] ← fully remapped, under your control
> ↑ mapped
> ---end---
No, this is wrong.
> It points out that the heap bookkeeping will be silently clobbered
> without the munmap() in-between (like we are doing):
Nope, doesn't work like that.
> ---breakdown---
> After `posix_memalign()`:
> [ posix_memalign() memory ] ← malloc thinks it owns this
>
> Then `mmap(aligned_ptr, ..., MAP_FIXED)`:
> [ anonymous mmap region ] ← malloc still thinks it owns this (!)
> ↑ mapped
> ---end---
Yes, this is correct and what we are doing here. The allocator always
owns it and we are just replacing the memory with a different mmap.
> It also gives a simpler solution for a memory that is not huge
> page backed but huge page aligned (our !variant->hugepage case):
> ---code---
> void *ptr;
> size_t alignment = 2 * 1024 * 1024; // or whatever HUGEPAGE_SIZE was
> size_t size = variant->buffer_size;
>
> // Step 1: Use posix_memalign to get an aligned pointer
> if (posix_memalign(&ptr, alignment, size) != 0) {
> perror("posix_memalign");
> return -1;
> }
Also no, the main point of this is to inject MAP_SHARED which
posix_memalign cannot not do.
> Also, for a huge page case, there is no need of posix_memalign():
> "Hugepages are not part of the standard heap, so allocator functions
> like posix_memalign() or malloc() don't help and can even get in the
> way."
> Instead, it suggests a cleaner version without posix_memalign():
> ---code---
> void *addr = mmap(NULL, variant->buffer_size, PROT_READ | PROT_WRITE,
> MAP_SHARED | MAP_ANONYMOUS | MAP_HUGETLB | MAP_POPULATE,
> -1, 0);
> if (addr == MAP_FAILED) { perror("mmap"); return -1; }
> ---end---
Yes, we could do this only for MAP_HUGETLB, but it doesn't help the
normal case with MAP_SHARED.
So I would leave it alone, use the version I showed.
Jason
Powered by blists - more mailing lists