[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200716081243.GA6561@willie-the-truck>
Date: Thu, 16 Jul 2020 09:12:43 +0100
From: Will Deacon <will@...nel.org>
To: Mike Kravetz <mike.kravetz@...cle.com>
Cc: Barry Song <song.bao.hua@...ilicon.com>,
Anshuman Khandual <anshuman.khandual@....com>,
Catalin Marinas <catalin.marinas@....com>, x86@...nel.org,
linux-kernel@...r.kernel.org, linuxarm@...wei.com,
linux-mm@...ck.org, Ingo Molnar <mingo@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
Jonathan Cameron <jonathan.cameron@...wei.com>,
"H.Peter Anvin" <hpa@...or.com>, Borislav Petkov <bp@...en8.de>,
akpm@...ux-foundation.org, Mike Rapoport <rppt@...ux.ibm.com>,
Roman Gushchin <guro@...com>,
linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH v3] mm/hugetlb: split hugetlb_cma in nodes with memory
On Wed, Jul 15, 2020 at 09:59:24AM -0700, Mike Kravetz wrote:
> On 7/15/20 1:18 AM, Will Deacon wrote:
> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> >> index f24acb3af741..a0007d1d12d2 100644
> >> --- a/mm/hugetlb.c
> >> +++ b/mm/hugetlb.c
> >> @@ -3273,6 +3273,9 @@ void __init hugetlb_add_hstate(unsigned int order)
> >> snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
> >> huge_page_size(h)/1024);
> >
> > (nit: you can also make hugetlb_cma_reserve() static and remote its function
> > prototypes from hugetlb.h)
>
> Yes thanks. I threw this together pretty quickly.
>
> >
> >> + if (order >= MAX_ORDER && hugetlb_cma_size)
> >> + hugetlb_cma_reserve(order);
> >
> > Although I really like the idea of moving this out of the arch code, I don't
> > quite follow the check against MAX_ORDER here -- it looks like a bit of a
> > hack to try to intercept the "PUD_SHIFT - PAGE_SHIFT" order which we
> > currently pass to hugetlb_cma_reserve(). Maybe we could instead have
> > something like:
> >
> > #ifndef HUGETLB_CMA_ORDER
> > #define HUGETLB_CMA_ORDER (PUD_SHIFT - PAGE_SHIFT)
> > #endif
> >
> > and then just do:
> >
> > if (order == HUGETLB_CMA_ORDER)
> > hugetlb_cma_reserve(order);
> >
> > ? Is there something else I'm missing?
> >
>
> Well, the current hugetlb CMA code only kicks in for gigantic pages as
> defined by the hugetlb code. For example, the code to allocate a page
> from CMA is in the routine alloc_gigantic_page(). alloc_gigantic_page()
> is called from alloc_fresh_huge_page() which starts with:
>
> if (hstate_is_gigantic(h))
> page = alloc_gigantic_page(h, gfp_mask, nid, nmask);
> else
> page = alloc_buddy_huge_page(h, gfp_mask,
> nid, nmask, node_alloc_noretry);
>
> and, hstate_is_gigantic is,
>
> static inline bool hstate_is_gigantic(struct hstate *h)
> {
> return huge_page_order(h) >= MAX_ORDER;
> }
>
> So, everything in the existing code really depends on the hugetlb definition
> of gigantic page (order >= MAX_ORDER). The code to check for
> 'order >= MAX_ORDER' in my proposed patch is just following the same
> convention.
Fair enough, and thanks for the explanation. Maybe just chuck a comment in,
then? Alternatively, having something like:
static inline bool page_order_is_gigantic(unsigned int order)
{
return order >= MAX_ORDER;
}
static inline bool hstate_is_gigantic(struct hstate *h)
{
return page_order_is_gigantic(huge_page_order(h));
}
and then using page_order_is_gigantic() to predicate the call to
hugetlb_cma_reserve? Dunno, maybe it's overkill. Up to you.
> I think the current dependency on the hugetlb definition of gigantic page
> may be too simplistic if using CMA for huegtlb pages becomes more common.
> Some architectures (sparc, powerpc) have more than one gigantic pages size.
> Currently there is no way to specify that CMA should be used for one and
> not the other. In addition, I could imagine someone wanting to reserve/use
> CMA for non-gigantic (PMD) sized pages. There is no mechainsm for that today.
>
> I honestly have not heard about many use cases for this CMA functionality.
> When support was initially added, it was driven by a specific use case and
> the 'all gigantic pages use CMA if defined' implementation was deemed
> sufficient. If there are more use cases, or this seems too simple we can
> revisit that decision.
Agreed, I think your patch is an improvement regardless of that.
Will
Powered by blists - more mailing lists