linux-kernel - Re: [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20231016143717.GH470544@cmpxchg.org>
Date:   Mon, 16 Oct 2023 10:37:17 -0400
From:   Johannes Weiner <hannes@...xchg.org>
To:     Zi Yan <ziy@...dia.com>
Cc:     David Hildenbrand <david@...hat.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Kefeng Wang <wangkefeng.wang@...wei.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene

On Mon, Oct 16, 2023 at 09:35:34AM -0400, Zi Yan wrote:
> > The attached patch has all the suggested changes, let me know how it
> > looks to you. Thanks.
> 
> The one I sent has free page accounting issues. The attached one fixes them.

Do you still have the warnings? I wonder what went wrong.

> @@ -883,6 +886,10 @@ int split_free_page(struct page *free_page,
>  	mt = get_pfnblock_migratetype(free_page, free_page_pfn);
>  	del_page_from_free_list(free_page, zone, order, mt);
>  
> +	set_pageblock_migratetype(free_page, mt1);
> +	set_pageblock_migratetype(pfn_to_page(free_page_pfn + split_pfn_offset),
> +				  mt2);
> +
>  	for (pfn = free_page_pfn;
>  	     pfn < free_page_pfn + (1UL << order);) {
>  		int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);

I don't think this is quite right.

With CONFIG_ARCH_FORCE_MAX_ORDER it's possible that we're dealing with
a buddy that is more than two blocks:

[pageblock 0][pageblock 1][pageblock 2][pageblock 3]
[buddy                                             ]
                                       [isolate range ..

That for loop splits the buddy into 4 blocks. The code above would set
pageblock 0 to old_mt, and pageblock 1 to new_mt. But it should only
set pageblock 3 to new_mt.

My proposal had the mt update in the caller:

> @@ -139,6 +139,62 @@ static struct page *has_unmovable_pages(unsigned long start_pfn, unsigned long e
>  	return NULL;
>  }
>  
> +/*
> + * additional steps for moving free pages during page isolation
> + */
> +static int move_freepages_for_isolation(struct zone *zone, unsigned long start_pfn,
> +			  unsigned long end_pfn, int old_mt, int new_mt)
> +{
> +	struct page *start_page = pfn_to_page(start_pfn);
> +	unsigned long pfn;
> +
> +	VM_WARN_ON(start_pfn & (pageblock_nr_pages - 1));
> +	VM_WARN_ON(start_pfn + pageblock_nr_pages - 1 != end_pfn);
> +
> +	/*
> +	 * A free page may be comprised of 2^n blocks, which means our
> +	 * block of interest could be head or tail in such a page.
> +	 *
> +	 * If we're a tail, update the type of our block, then split
> +	 * the page into pageblocks. The splitting will do the leg
> +	 * work of sorting the blocks into the right freelists.
> +	 *
> +	 * If we're a head, split the page into pageblocks first. This
> +	 * ensures the migratetypes still match up during the freelist
> +	 * removal. Then do the regular scan for buddies in the block
> +	 * of interest, which will handle the rest.
> +	 *
> +	 * In theory, we could try to preserve 2^1 and larger blocks
> +	 * that lie outside our range. In practice, MAX_ORDER is
> +	 * usually one or two pageblocks anyway, so don't bother.
> +	 *
> +	 * Note that this only applies to page isolation, which calls
> +	 * this on random blocks in the pfn range! When we move stuff
> +	 * from inside the page allocator, the pages are coming off
> +	 * the freelist (can't be tail) and multi-block pages are
> +	 * handled directly in the stealing code (can't be a head).
> +	 */
> +
> +	/* We're a tail */
> +	pfn = find_straddling_buddy(start_pfn);
> +	if (pfn != start_pfn) {
> +		struct page *free_page = pfn_to_page(pfn);
> +
> +		split_free_page(free_page, buddy_order(free_page),
> +				pageblock_nr_pages, old_mt, new_mt);
> +		return pageblock_nr_pages;
> +	}
> +
> +	/* We're a head */
> +	if (PageBuddy(start_page) && buddy_order(start_page) > pageblock_order) {
> +		split_free_page(start_page, buddy_order(start_page),
> +				pageblock_nr_pages, new_mt, old_mt);
> +		return pageblock_nr_pages;
> +	}

i.e. here ^: set the mt of the block that's in isolation range, then
split the block.

I think I can guess the warning you were getting: in the head case, we
need to change the type of the head pageblock that's on the
freelist. If we do it before calling split, the
del_page_from_freelist() in there warns about the wrong type.

How about pulling the freelist removal out of split_free_page()?

	del_page_from_freelist(huge_buddy);
	set_pageblock_migratetype(start_page, MIGRATE_ISOLATE);
	split_free_page(huge_buddy, buddy_order(), pageblock_nr_pages);
	return pageblock_nr_pages;