[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d6a79fa6-dcc1-4181-9946-940a91c0b1f2@redhat.com>
Date: Tue, 10 Dec 2024 22:40:15 +0100
From: David Hildenbrand <david@...hat.com>
To: Johannes Weiner <hannes@...xchg.org>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
Andrew Morton <akpm@...ux-foundation.org>, Zi Yan <ziy@...dia.com>,
Vlastimil Babka <vbabka@...e.cz>, Yu Zhao <yuzhao@...gle.com>
Subject: Re: [PATCH v2 1/2] mm/page_alloc: conditionally split >
pageblock_order pages in free_one_page() and move_freepages_block_isolate()
On 10.12.24 22:16, Johannes Weiner wrote:
> On Tue, Dec 10, 2024 at 11:29:52AM +0100, David Hildenbrand wrote:
>> @@ -1225,27 +1225,53 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>> spin_unlock_irqrestore(&zone->lock, flags);
>> }
>>
>> -/* Split a multi-block free page into its individual pageblocks. */
>> -static void split_large_buddy(struct zone *zone, struct page *page,
>> - unsigned long pfn, int order, fpi_t fpi)
>> +static bool pfnblock_migratetype_equal(unsigned long pfn,
>> + unsigned long end_pfn, int mt)
>> {
>> - unsigned long end = pfn + (1 << order);
>> + VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | end_pfn, pageblock_nr_pages));
>>
>> + while (pfn != end_pfn) {
>> + struct page *page = pfn_to_page(pfn);
>> +
>> + if (unlikely(mt != get_pfnblock_migratetype(page, pfn)))
>> + return false;
>> + pfn += pageblock_nr_pages;
>> + }
>> + return true;
>> +}
>> +
>> +static void __free_one_page_maybe_split(struct zone *zone, struct page *page,
>> + unsigned long pfn, int order, fpi_t fpi_flags)
>> +{
>> + const unsigned long end_pfn = pfn + (1 << order);
>> + int mt = get_pfnblock_migratetype(page, pfn);
>> +
>> + VM_WARN_ON_ONCE(order > MAX_PAGE_ORDER);
>> VM_WARN_ON_ONCE(!IS_ALIGNED(pfn, 1 << order));
>> /* Caller removed page from freelist, buddy info cleared! */
>> VM_WARN_ON_ONCE(PageBuddy(page));
>>
>> - if (order > pageblock_order)
>> - order = pageblock_order;
>> + /*
>> + * With CONFIG_MEMORY_ISOLATION, we might be freeing MAX_ORDER_NR_PAGES
>> + * pages that cover pageblocks with different migratetypes; for example
>> + * only some migratetypes might be MIGRATE_ISOLATE. In that (unlikely)
>> + * case, fallback to freeing individual pageblocks so they get put
>> + * onto the right lists.
>> + */
>> + if (!IS_ENABLED(CONFIG_MEMORY_ISOLATION) ||
>> + likely(order <= pageblock_order) ||
>> + pfnblock_migratetype_equal(pfn + pageblock_nr_pages, end_pfn, mt)) {
>> + __free_one_page(page, pfn, zone, order, mt, fpi_flags);
>> + return;
>> + }
Hi Johannes,
>
> Ok, if memory isolation is disabled, we know the migratetypes are all
> matching up, and we can skip the check. However, if memory isolation
> is enabled, but this isn't move_freepages_block_isolate() calling, we
> still do the check unnecessarily and slow down the boot, no?
Yes, although it's on most machines one additional pageblock check
(x86), on some a bit more (e.g., 3 on s390x).
As mentioned:
"
In the future, we might want to assume that all pageblocks are equal if
zone->nr_isolate_pageblock == 0; however, that will require some
zone->nr_isolate_pageblock accounting changes, such that we are
guaranteed to see zone->nr_isolate_pageblock != 0 when there is an
isolated pageblock.
"
With that boot time wouldn't suffer in any significant way.
>
> Having a function guess the caller is a bit of an anti-pattern. The
> resulting code is hard to follow, and it's very easy to
> unintentionally burden some cases with unnecessary stuff. It's better
> to unshare paths until you don't need conditionals like this.
> > In addition to the fastpath, I think you're also punishing the
> move_freepages_block_isolate() case. We *know* we just changed the
> type of one of the buddy's blocks, and yet you're still checking the
> the range again to decide whether to split.
Yes, that's not ideal, and it would be easy to unshare that case (call
the "split" function instead of a "maybe_split" function).
I am not 100% sure though, if move_freepages_block_isolate() can always
decide "I really have a mixture", but that code is simply quite advanced :)
>
> All of this to accomodate hugetlb, which might not even be compiled
> in? Grrrr.
Jup. But at the same time, it's frequently compiled in but never used
(or barely used; I mean, how often do people actually free 1Gig hugetlb
pages compared to ordinary pages).
>
> Like you, I was quite surprised to see that GFP_COMP patch in the
> buddy hotpath splitting *everything* into blocks - on the offchance
> that somebody might free a hugetlb page. Even if !CONFIG_HUGETLB. Just
> - what the hell. We shouldn't merge "I only care about my niche
> usecase at the expense of literally everybody else" patches like this.
After talking to Willy, the whole _GFP_COMP stuff might get removed
sooner or later again once we hand out frozen refcount in
alloc_contig_range(). It might take a while, though.
>
> My vote is NAK on this patch, and a retro-NAK on the GFP_COMP patch.
I won't fight for this patch *if* the GFP_COMP patch gets reverted. It
improves the situation, which can be improved further.
But if it doesn't get reverted, we have to think about something else.
>
> The buddy allocator operates on the assumption of MAX_PAGE_ORDER. If
> we support folios of a larger size sourced from other allocators, then
> it should be the folio layer discriminating. So if folio_put() detects
> this is a massive alloc_contig chunk, then it should take a different
> freeing path. Do the splitting in there, then pass valid chunks back
> to the buddy. That would keep the layering cleaner and the cornercase
> overhead out of the allocator fastpath.
That might be better, although not that completely trivial I assume.
How to handle the "MAX_PAGE_ORDER page is getting freed but one
pageblock is isolated" case cleanly is a bit of a head scratcher, at
least to me. But I suspect we had it fullt working before the GFF_COMP
patch.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists