lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241210211613.GC2508492@cmpxchg.org>
Date: Tue, 10 Dec 2024 16:16:13 -0500
From: Johannes Weiner <hannes@...xchg.org>
To: David Hildenbrand <david@...hat.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	Andrew Morton <akpm@...ux-foundation.org>, Zi Yan <ziy@...dia.com>,
	Vlastimil Babka <vbabka@...e.cz>, Yu Zhao <yuzhao@...gle.com>
Subject: Re: [PATCH v2 1/2] mm/page_alloc: conditionally split >
 pageblock_order pages in free_one_page() and move_freepages_block_isolate()

On Tue, Dec 10, 2024 at 11:29:52AM +0100, David Hildenbrand wrote:
> @@ -1225,27 +1225,53 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>  	spin_unlock_irqrestore(&zone->lock, flags);
>  }
>  
> -/* Split a multi-block free page into its individual pageblocks. */
> -static void split_large_buddy(struct zone *zone, struct page *page,
> -			      unsigned long pfn, int order, fpi_t fpi)
> +static bool pfnblock_migratetype_equal(unsigned long pfn,
> +		unsigned long end_pfn, int mt)
>  {
> -	unsigned long end = pfn + (1 << order);
> +	VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | end_pfn, pageblock_nr_pages));
>  
> +	while (pfn != end_pfn) {
> +		struct page *page = pfn_to_page(pfn);
> +
> +		if (unlikely(mt != get_pfnblock_migratetype(page, pfn)))
> +			return false;
> +		pfn += pageblock_nr_pages;
> +	}
> +	return true;
> +}
> +
> +static void __free_one_page_maybe_split(struct zone *zone, struct page *page,
> +		unsigned long pfn, int order, fpi_t fpi_flags)
> +{
> +	const unsigned long end_pfn = pfn + (1 << order);
> +	int mt = get_pfnblock_migratetype(page, pfn);
> +
> +	VM_WARN_ON_ONCE(order > MAX_PAGE_ORDER);
>  	VM_WARN_ON_ONCE(!IS_ALIGNED(pfn, 1 << order));
>  	/* Caller removed page from freelist, buddy info cleared! */
>  	VM_WARN_ON_ONCE(PageBuddy(page));
>  
> -	if (order > pageblock_order)
> -		order = pageblock_order;
> +	/*
> +	 * With CONFIG_MEMORY_ISOLATION, we might be freeing MAX_ORDER_NR_PAGES
> +	 * pages that cover pageblocks with different migratetypes; for example
> +	 * only some migratetypes might be MIGRATE_ISOLATE. In that (unlikely)
> +	 * case, fallback to freeing individual pageblocks so they get put
> +	 * onto the right lists.
> +	 */
> +	if (!IS_ENABLED(CONFIG_MEMORY_ISOLATION) ||
> +	    likely(order <= pageblock_order) ||
> +	    pfnblock_migratetype_equal(pfn + pageblock_nr_pages, end_pfn, mt)) {
> +		__free_one_page(page, pfn, zone, order, mt, fpi_flags);
> +		return;
> +	}

Ok, if memory isolation is disabled, we know the migratetypes are all
matching up, and we can skip the check. However, if memory isolation
is enabled, but this isn't move_freepages_block_isolate() calling, we
still do the check unnecessarily and slow down the boot, no?

Having a function guess the caller is a bit of an anti-pattern. The
resulting code is hard to follow, and it's very easy to
unintentionally burden some cases with unnecessary stuff. It's better
to unshare paths until you don't need conditionals like this.

In addition to the fastpath, I think you're also punishing the
move_freepages_block_isolate() case. We *know* we just changed the
type of one of the buddy's blocks, and yet you're still checking the
the range again to decide whether to split.

All of this to accomodate hugetlb, which might not even be compiled
in? Grrrr.

Like you, I was quite surprised to see that GFP_COMP patch in the
buddy hotpath splitting *everything* into blocks - on the offchance
that somebody might free a hugetlb page. Even if !CONFIG_HUGETLB. Just
- what the hell. We shouldn't merge "I only care about my niche
usecase at the expense of literally everybody else" patches like this.

My vote is NAK on this patch, and a retro-NAK on the GFP_COMP patch.

The buddy allocator operates on the assumption of MAX_PAGE_ORDER. If
we support folios of a larger size sourced from other allocators, then
it should be the folio layer discriminating. So if folio_put() detects
this is a massive alloc_contig chunk, then it should take a different
freeing path. Do the splitting in there, then pass valid chunks back
to the buddy. That would keep the layering cleaner and the cornercase
overhead out of the allocator fastpath.

It would also avoid the pointless and fragile attempt at freeing a
big, non-buddy chunk through the PCP.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ