linux-kernel - Re: [PATCH v2 1/2] mm/page_alloc: conditionally split > pageblock_order pages in free_one_page() and move_freepages_block

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d6a79fa6-dcc1-4181-9946-940a91c0b1f2@redhat.com>
Date: Tue, 10 Dec 2024 22:40:15 +0100
From: David Hildenbrand <david@...hat.com>
To: Johannes Weiner <hannes@...xchg.org>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
 Andrew Morton <akpm@...ux-foundation.org>, Zi Yan <ziy@...dia.com>,
 Vlastimil Babka <vbabka@...e.cz>, Yu Zhao <yuzhao@...gle.com>
Subject: Re: [PATCH v2 1/2] mm/page_alloc: conditionally split >
 pageblock_order pages in free_one_page() and move_freepages_block_isolate()

On 10.12.24 22:16, Johannes Weiner wrote:
> On Tue, Dec 10, 2024 at 11:29:52AM +0100, David Hildenbrand wrote:
>> @@ -1225,27 +1225,53 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>>   	spin_unlock_irqrestore(&zone->lock, flags);
>>   }
>>   
>> -/* Split a multi-block free page into its individual pageblocks. */
>> -static void split_large_buddy(struct zone *zone, struct page *page,
>> -			      unsigned long pfn, int order, fpi_t fpi)
>> +static bool pfnblock_migratetype_equal(unsigned long pfn,
>> +		unsigned long end_pfn, int mt)
>>   {
>> -	unsigned long end = pfn + (1 << order);
>> +	VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | end_pfn, pageblock_nr_pages));
>>   
>> +	while (pfn != end_pfn) {
>> +		struct page *page = pfn_to_page(pfn);
>> +
>> +		if (unlikely(mt != get_pfnblock_migratetype(page, pfn)))
>> +			return false;
>> +		pfn += pageblock_nr_pages;
>> +	}
>> +	return true;
>> +}
>> +
>> +static void __free_one_page_maybe_split(struct zone *zone, struct page *page,
>> +		unsigned long pfn, int order, fpi_t fpi_flags)
>> +{
>> +	const unsigned long end_pfn = pfn + (1 << order);
>> +	int mt = get_pfnblock_migratetype(page, pfn);
>> +
>> +	VM_WARN_ON_ONCE(order > MAX_PAGE_ORDER);
>>   	VM_WARN_ON_ONCE(!IS_ALIGNED(pfn, 1 << order));
>>   	/* Caller removed page from freelist, buddy info cleared! */
>>   	VM_WARN_ON_ONCE(PageBuddy(page));
>>   
>> -	if (order > pageblock_order)
>> -		order = pageblock_order;
>> +	/*
>> +	 * With CONFIG_MEMORY_ISOLATION, we might be freeing MAX_ORDER_NR_PAGES
>> +	 * pages that cover pageblocks with different migratetypes; for example
>> +	 * only some migratetypes might be MIGRATE_ISOLATE. In that (unlikely)
>> +	 * case, fallback to freeing individual pageblocks so they get put
>> +	 * onto the right lists.
>> +	 */
>> +	if (!IS_ENABLED(CONFIG_MEMORY_ISOLATION) ||
>> +	    likely(order <= pageblock_order) ||
>> +	    pfnblock_migratetype_equal(pfn + pageblock_nr_pages, end_pfn, mt)) {
>> +		__free_one_page(page, pfn, zone, order, mt, fpi_flags);
>> +		return;
>> +	}

Hi Johannes,

> 
> Ok, if memory isolation is disabled, we know the migratetypes are all
> matching up, and we can skip the check. However, if memory isolation
> is enabled, but this isn't move_freepages_block_isolate() calling, we
> still do the check unnecessarily and slow down the boot, no?

Yes, although it's on most machines one additional pageblock check 
(x86), on some a bit more (e.g., 3 on s390x).

As mentioned:

"
In the future, we might want to assume that all pageblocks are equal if
zone->nr_isolate_pageblock  == 0; however, that will require some
zone->nr_isolate_pageblock accounting changes, such that we are
guaranteed to see zone->nr_isolate_pageblock != 0 when there is an
isolated pageblock.
"

With that boot time wouldn't suffer in any significant way.

> 
> Having a function guess the caller is a bit of an anti-pattern. The
> resulting code is hard to follow, and it's very easy to
> unintentionally burden some cases with unnecessary stuff. It's better
> to unshare paths until you don't need conditionals like this.
 > > In addition to the fastpath, I think you're also punishing the
> move_freepages_block_isolate() case. We *know* we just changed the
> type of one of the buddy's blocks, and yet you're still checking the
> the range again to decide whether to split.

Yes, that's not ideal, and it would be easy to unshare that case (call 
the "split" function instead of a "maybe_split" function).

I am not 100% sure though, if move_freepages_block_isolate() can always 
decide "I really have a mixture", but that code is simply quite advanced :)

> 
> All of this to accomodate hugetlb, which might not even be compiled
> in? Grrrr.

Jup. But at the same time, it's frequently compiled in but never used 
(or barely used; I mean, how often do people actually free 1Gig hugetlb 
pages compared to ordinary pages).

> 
> Like you, I was quite surprised to see that GFP_COMP patch in the
> buddy hotpath splitting *everything* into blocks - on the offchance
> that somebody might free a hugetlb page. Even if !CONFIG_HUGETLB. Just
> - what the hell. We shouldn't merge "I only care about my niche
> usecase at the expense of literally everybody else" patches like this.

After talking to Willy, the whole _GFP_COMP stuff might get removed 
sooner or later again once we hand out frozen refcount in 
alloc_contig_range(). It might take a while, though.

> 
> My vote is NAK on this patch, and a retro-NAK on the GFP_COMP patch.

I won't fight for this patch *if* the GFP_COMP patch gets reverted. It 
improves the situation, which can be improved further.

But if it doesn't get reverted, we have to think about something else.

> 
> The buddy allocator operates on the assumption of MAX_PAGE_ORDER. If
> we support folios of a larger size sourced from other allocators, then
> it should be the folio layer discriminating. So if folio_put() detects
> this is a massive alloc_contig chunk, then it should take a different
> freeing path. Do the splitting in there, then pass valid chunks back
> to the buddy. That would keep the layering cleaner and the cornercase
> overhead out of the allocator fastpath.

That might be better, although not that completely trivial I assume.

How to handle the "MAX_PAGE_ORDER page is getting freed but one 
pageblock is isolated" case cleanly is a bit of a head scratcher, at 
least to me. But I suspect we had it fullt working before the GFF_COMP 
patch.

-- 
Cheers,

David / dhildenb