[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9733d8cf-edab-4b2b-bf2e-11457ef63dc8@lucifer.local>
Date: Fri, 13 Jun 2025 20:18:42 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Peter Xu <peterx@...hat.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org, kvm@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>,
Alex Williamson <alex.williamson@...hat.com>, Zi Yan <ziy@...dia.com>,
Jason Gunthorpe <jgg@...dia.com>, Alex Mastro <amastro@...com>,
David Hildenbrand <david@...hat.com>, Nico Pache <npache@...hat.com>,
Baolin Wang <baolin.wang@...ux.alibaba.com>,
"Liam R. Howlett" <Liam.Howlett@...cle.com>,
Ryan Roberts <ryan.roberts@....com>, Dev Jain <dev.jain@....com>,
Barry Song <baohua@...nel.org>
Subject: Re: [PATCH 3/5] mm: Rename __thp_get_unmapped_area to
mm_get_unmapped_area_aligned
On Fri, Jun 13, 2025 at 02:45:31PM -0400, Peter Xu wrote:
> On Fri, Jun 13, 2025 at 04:36:57PM +0100, Lorenzo Stoakes wrote:
> > On Fri, Jun 13, 2025 at 09:41:09AM -0400, Peter Xu wrote:
> > > This function is pretty handy for any type of VMA to provide a size-aligned
> > > VMA address when mmap(). Rename the function and export it.
> >
> > This isn't a great commit message, 'to provide a size-aligned VMA address when
> > mmap()' is super unclear - do you mean 'to provide an unmapped address that is
> > also aligned to the specified size'?
>
> I sincerely don't know the difference, not a native speaker here..
> Suggestions welcomed, I can update to whatever both of us agree on.
Sure, sorry I don't mean to be pedantic I just think it would be clearer to
sort of expand upon this, as the commit message is rather short.
I think saying something like this function allows you to locate an
unmapped region which is aligned to the specified size should suffice.
>
> >
> > I think you should also specify your motive, renaming and exporting something
> > because it seems handy isn't sufficient justifiation.
> >
> > Also why would we need to export this? What modules might want to use this? I'm
> > generally not a huge fan of exporting things unless we strictly have to.
>
> It's one of the major reasons why I sent this together with the VFIO
> patches. It'll be used in VFIO patches that is in the same series. I will
> mention it in the commit message when repost.
OK cool, I've not dug through those as not my area, really it's about
having the appropriate justification.
I'm always inclined to not want us to export things by default, based on
experience of finding 'unusual' uses of various mm interfaces in drivers in
the past which have caused problems :)
But of course there are situations that warrant it, they just need to be
spelled out.
>
> >
> > >
> > > About the rename:
> > >
> > > - Dropping "THP" because it doesn't really have much to do with THP
> > > internally.
> >
> > Well the function seems specifically tailored to the THP use. I think you'll
> > need to further adjust this.
>
> Actually.. it is almost exactly what I need so far. I can justify it below.
Yeah, but it's not a general function that gives you an unmapped area that
is aligned.
It's a 'function that gets you an aligned unmapped area but only for 64-bit
kernels and when you are not invoking it from a compat syscall and returns
0 instead of errors'.
This doesn't sound general to me?
>
> >
> > >
> > > - The suffix "_aligned" imply it is a helper to generate aligned virtual
> > > address based on what is specified (which can be not PMD_SIZE).
> >
> > Ack this is sensible!
> >
> > >
> > > Cc: Zi Yan <ziy@...dia.com>
> > > Cc: Baolin Wang <baolin.wang@...ux.alibaba.com>
> > > Cc: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> > > Cc: "Liam R. Howlett" <Liam.Howlett@...cle.com>
> > > Cc: Ryan Roberts <ryan.roberts@....com>
> > > Cc: Dev Jain <dev.jain@....com>
> > > Cc: Barry Song <baohua@...nel.org>
> > > Signed-off-by: Peter Xu <peterx@...hat.com>
> > > ---
> > > include/linux/huge_mm.h | 14 +++++++++++++-
> > > mm/huge_memory.c | 6 ++++--
> > > 2 files changed, 17 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > index 2f190c90192d..706488d92bb6 100644
> > > --- a/include/linux/huge_mm.h
> > > +++ b/include/linux/huge_mm.h
> >
> > Why are we keeping everything in huge_mm.h, huge_memory.c if this is being made
> > generic?
> >
> > Surely this should be moved out into mm/mmap.c no?
>
> No objections, but I suggest a separate discussion and patch submission
> when the original function resides in huge_memory.c. Hope it's ok for you.
I like to be as flexible as I can be in review, but I'm afraid I'm going to
have to be annoying about this one :)
It simply makes no sense to have non-THP stuff in 'the THP file'. Also this
makes this a general memory mapping function that should live with the
other related code.
I don't really think much discussion is required here? You could do this as
2 separate commits if that'd make life easier?
Sorry to be a pain here, but I'm really allergic to our having random
unrelated things in the wrong files, it's something mm has done rather too
much...
>
> >
> > > @@ -339,7 +339,10 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
> > > unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr,
> > > unsigned long len, unsigned long pgoff, unsigned long flags,
> > > vm_flags_t vm_flags);
> > > -
> > > +unsigned long mm_get_unmapped_area_aligned(struct file *filp,
> > > + unsigned long addr, unsigned long len,
> > > + loff_t off, unsigned long flags, unsigned long size,
> > > + vm_flags_t vm_flags);
> >
> > I echo Jason's comments about a kdoc and explanation of what this function does.
> >
> > > bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins);
> > > int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> > > unsigned int new_order);
> > > @@ -543,6 +546,15 @@ thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr,
> > > return 0;
> > > }
> > >
> > > +static inline unsigned long
> > > +mm_get_unmapped_area_aligned(struct file *filp,
> > > + unsigned long addr, unsigned long len,
> > > + loff_t off, unsigned long flags, unsigned long size,
> > > + vm_flags_t vm_flags)
> > > +{
> > > + return 0;
> > > +}
> > > +
> > > static inline bool
> > > can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins)
> > > {
> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > index 4734de1dc0ae..52f13a70562f 100644
> > > --- a/mm/huge_memory.c
> > > +++ b/mm/huge_memory.c
> > > @@ -1088,7 +1088,7 @@ static inline bool is_transparent_hugepage(const struct folio *folio)
> > > folio_test_large_rmappable(folio);
> > > }
> > >
> > > -static unsigned long __thp_get_unmapped_area(struct file *filp,
> > > +unsigned long mm_get_unmapped_area_aligned(struct file *filp,
> > > unsigned long addr, unsigned long len,
> > > loff_t off, unsigned long flags, unsigned long size,
> > > vm_flags_t vm_flags)
> > > @@ -1132,6 +1132,7 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
> > > ret += off_sub;
> > > return ret;
> > > }
> > > +EXPORT_SYMBOL_GPL(mm_get_unmapped_area_aligned);
> >
> > I'm not convinced about exporting this... shouldn't be export only if we
> > explicitly have a user?
> >
> > I'd rather we didn't unless we needed to.
> >
> > >
> > > unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr,
> > > unsigned long len, unsigned long pgoff, unsigned long flags,
> > > @@ -1140,7 +1141,8 @@ unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long add
> > > unsigned long ret;
> > > loff_t off = (loff_t)pgoff << PAGE_SHIFT;
> > >
> > > - ret = __thp_get_unmapped_area(filp, addr, len, off, flags, PMD_SIZE, vm_flags);
> > > + ret = mm_get_unmapped_area_aligned(filp, addr, len, off, flags,
> > > + PMD_SIZE, vm_flags);
> > > if (ret)
> > > return ret;
> > >
> > > --
> > > 2.49.0
> > >
> >
> > So, you don't touch the original function but there's stuff there I think we
> > need to think about if this is generalised.
> >
> > E.g.:
> >
> > if (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall())
> > return 0;
> >
> > This still valid?
>
> Yes. I want this feature (for VFIO) to not be enabled on 32bits, and not
> enabled with compat syscals.
OK, but then is this a 'general' function any more?
These checks were introduced by commit 4ef9ad19e176 ("mm: huge_memory:
don't force huge page alignment on 32 bit") and so are _absolutely
specifically_ intended for a THP use-case.
And now they _just happen_ to be useful to you but nothing about the
function name suggests that this is the case?
I mean it seems like you should be doing this check separately in both VFIO
and THP code and having the 'general 'function not do this no?
>
> >
> > /*
> > * The failure might be due to length padding. The caller will retry
> > * without the padding.
> > */
> > if (IS_ERR_VALUE(ret))
> > return 0;
> >
> > This is assuming things the (currently single) caller will do, that is no longer
> > an assumption you can make, especially if exported.
>
> It's part of core function we want from a generic helper. We want to know
> when the va allocation, after padded, would fail due to the padding. Then
> the caller can decide what to do next. It needs to fail here properly.
I'm no sure I understand what you mean?
It's not just this case, it's basically any error condition results in 0.
It's actually quite dangerous, as the get_unmapped_area() functions are
meant to return either an error value or the located address _and zero is a
valid response_.
So if somebody used this function naively, they'd potentially have a very
nasty bug occur when an error arose.
If you want to export this, I just don't think we can have this be a thing
here.
>
> >
> > Actually you maybe want to abstract the whole of thp_get_unmapped_area_vmflags()
> > no? As this has a fallback mode?
> >
> > /*
> > * Do not try to align to THP boundary if allocation at the address
> > * hint succeeds.
> > */
> > if (ret == addr)
> > return addr;
>
> This is not a fallback. This is when user specified a hint address (no
> matter with / without MAP_FIXED), if that address works then we should
> reuse that address, ignoring the alignment requirement from the driver.
> This is exactly the behavior VFIO needs, and this should also be the
> suggested behavior for whatever new drivers that would like to start using
> this generic helper.
I didn't say this was the fallback :) this just happened to be the code
underneath my comment. Sorry if that wasn't clear.
This is another kinda non-general thing but one that makes more sense. This
comment needs updating, however, obviously. You could just delete 'THP' in
the comment that'd probalby do it.
The fallback is in:
unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long flags,
vm_flags_t vm_flags)
{
unsigned long ret;
loff_t off = (loff_t)pgoff << PAGE_SHIFT;
ret = __thp_get_unmapped_area(filp, addr, len, off, flags, PMD_SIZE, vm_flags);
if (ret)
return ret;
So here, if ret returns an address, then it's fine we return that.
Otherwise, we invoke the below (the fallback):
return mm_get_unmapped_area_vmflags(current->mm, filp, addr, len, pgoff, flags,
vm_flags);
}
>
> >
> > What was that about this no longer being relevant to THP? :>)
> >
> > Are all of these 'return 0' cases expected by any sensible caller? It seems like
> > it's a way for thp_get_unmapped_area_vmflags() to recognise when to fall back to
> > non-aligned?
>
> Hope above justfies everything. It's my intention to reuse everything
> here. If you have any concern on any of the "return 0" cases in the
> function being exported, please shoot, we can discuss.
Of course, I have some doubts here :)
>
> Thanks,
>
> --
> Peter Xu
>
To be clearer perhaps, what I think would work here is:
1. Remove the CONFIG_64BIT, in_compat_syscall() check and place it in THP
and VFIO code separately, as this isn't a general thing.
2. Rather than return 0 in this function, return error codes so it matches
the other mm_get_unmapped_area_*() functions.
3. Adjust thp_get_unmapped_area_vmflags() to detect the error value from
this function and do the fallback logic in this case. There's no need
for this 0 stuff (and it's possibly broken actually, since _in theory_
you can get unmapped zero).
4. (sorry :) move the code to mm/mmap.c
5. Obviously address comments from others, most importantly (in my view)
ensuring that there is a good kernel doc comment around the function.
6. Put the justifiation for exporting the function + stuff about VFIO in
the commit message + expand it a little bit as discussed.
7. Other small stuff raised above (e.g. remove 'THP' comment etc.)
Again, sorry to be a pain, but I think we need to be careful to get this
right so we don't leave any footguns for ourselves in the future with
'implicit' stuff.
Thanks!
Powered by blists - more mailing lists