linux-kernel - Re: [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5B61B70F-E85D-4E6F-8856-17A90B899F98@nvidia.com>
Date:   Mon, 16 Oct 2023 11:00:33 -0400
From:   Zi Yan <ziy@...dia.com>
To:     Johannes Weiner <hannes@...xchg.org>
Cc:     David Hildenbrand <david@...hat.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Kefeng Wang <wangkefeng.wang@...wei.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene

On 16 Oct 2023, at 10:37, Johannes Weiner wrote:

> On Mon, Oct 16, 2023 at 09:35:34AM -0400, Zi Yan wrote:
>>> The attached patch has all the suggested changes, let me know how it
>>> looks to you. Thanks.
>>
>> The one I sent has free page accounting issues. The attached one fixes them.
>
> Do you still have the warnings? I wonder what went wrong.

No warnings. But something with the code:

1. in your version, split_free_page() is called without changing any pageblock
migratetypes, then split_free_page() is just a no-op, since the page is
just deleted from the free list, then freed via different orders. Buddy allocator
will merge them back.

2. in my version, I set pageblock migratetype to new_mt before split_free_page(),
but it causes free page accounting issues, since in the case of head, free pages
are deleted from new_mt when they are in old_mt free list and the accounting
decreases new_mt free page number instead of old_mt one.

Basically, split_free_page() is awkward as it relies on preset migratetypes,
which changes migratetypes without deleting the free pages from the list first.
That is why I came up with the new split_free_page() below.

>
>> @@ -883,6 +886,10 @@ int split_free_page(struct page *free_page,
>>  	mt = get_pfnblock_migratetype(free_page, free_page_pfn);
>>  	del_page_from_free_list(free_page, zone, order, mt);
>>
>> +	set_pageblock_migratetype(free_page, mt1);
>> +	set_pageblock_migratetype(pfn_to_page(free_page_pfn + split_pfn_offset),
>> +				  mt2);
>> +
>>  	for (pfn = free_page_pfn;
>>  	     pfn < free_page_pfn + (1UL << order);) {
>>  		int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
>
> I don't think this is quite right.
>
> With CONFIG_ARCH_FORCE_MAX_ORDER it's possible that we're dealing with
> a buddy that is more than two blocks:
>
> [pageblock 0][pageblock 1][pageblock 2][pageblock 3]
> [buddy                                             ]
>                                        [isolate range ..
>
> That for loop splits the buddy into 4 blocks. The code above would set
> pageblock 0 to old_mt, and pageblock 1 to new_mt. But it should only
> set pageblock 3 to new_mt.

OK. I think I need to fix split_free_page().

Hmm, if CONFIG_ARCH_FORCE_MAX_ORDER can make a buddy have more than one
pageblock and in turn makes an in-use page have more than one pageblock,
we will have problems. Since in isolate_single_pageblock(), an in-use page
can have part of its pageblock set to a different migratetype and be freed,
causing the free page with unmatched migratetypes. We might need to
free pages at pageblock_order if their orders are bigger than pageblock_order.

Which arch with CONFIG_ARCH_FORCE_MAX_ORDER can have a buddy containing more
than one pageblocks? I would like to make some tests.

>
> My proposal had the mt update in the caller:
>
>> @@ -139,6 +139,62 @@ static struct page *has_unmovable_pages(unsigned long start_pfn, unsigned long e
>>  	return NULL;
>>  }
>>
>> +/*
>> + * additional steps for moving free pages during page isolation
>> + */
>> +static int move_freepages_for_isolation(struct zone *zone, unsigned long start_pfn,
>> +			  unsigned long end_pfn, int old_mt, int new_mt)
>> +{
>> +	struct page *start_page = pfn_to_page(start_pfn);
>> +	unsigned long pfn;
>> +
>> +	VM_WARN_ON(start_pfn & (pageblock_nr_pages - 1));
>> +	VM_WARN_ON(start_pfn + pageblock_nr_pages - 1 != end_pfn);
>> +
>> +	/*
>> +	 * A free page may be comprised of 2^n blocks, which means our
>> +	 * block of interest could be head or tail in such a page.
>> +	 *
>> +	 * If we're a tail, update the type of our block, then split
>> +	 * the page into pageblocks. The splitting will do the leg
>> +	 * work of sorting the blocks into the right freelists.
>> +	 *
>> +	 * If we're a head, split the page into pageblocks first. This
>> +	 * ensures the migratetypes still match up during the freelist
>> +	 * removal. Then do the regular scan for buddies in the block
>> +	 * of interest, which will handle the rest.
>> +	 *
>> +	 * In theory, we could try to preserve 2^1 and larger blocks
>> +	 * that lie outside our range. In practice, MAX_ORDER is
>> +	 * usually one or two pageblocks anyway, so don't bother.
>> +	 *
>> +	 * Note that this only applies to page isolation, which calls
>> +	 * this on random blocks in the pfn range! When we move stuff
>> +	 * from inside the page allocator, the pages are coming off
>> +	 * the freelist (can't be tail) and multi-block pages are
>> +	 * handled directly in the stealing code (can't be a head).
>> +	 */
>> +
>> +	/* We're a tail */
>> +	pfn = find_straddling_buddy(start_pfn);
>> +	if (pfn != start_pfn) {
>> +		struct page *free_page = pfn_to_page(pfn);
>> +
>> +		split_free_page(free_page, buddy_order(free_page),
>> +				pageblock_nr_pages, old_mt, new_mt);
>> +		return pageblock_nr_pages;
>> +	}
>> +
>> +	/* We're a head */
>> +	if (PageBuddy(start_page) && buddy_order(start_page) > pageblock_order) {
>> +		split_free_page(start_page, buddy_order(start_page),
>> +				pageblock_nr_pages, new_mt, old_mt);
>> +		return pageblock_nr_pages;
>> +	}
>
> i.e. here ^: set the mt of the block that's in isolation range, then
> split the block.
>
> I think I can guess the warning you were getting: in the head case, we
> need to change the type of the head pageblock that's on the
> freelist. If we do it before calling split, the
> del_page_from_freelist() in there warns about the wrong type.
>
> How about pulling the freelist removal out of split_free_page()?
>
> 	del_page_from_freelist(huge_buddy);
> 	set_pageblock_migratetype(start_page, MIGRATE_ISOLATE);
> 	split_free_page(huge_buddy, buddy_order(), pageblock_nr_pages);
> 	return pageblock_nr_pages;

Yes, this is better. Let me change to this implementation.

But I would like to test it on an environment where a buddy contains more than
one pageblocks first. I probably can change MAX_ORDER of x86_64 to do it locally.
I will report back.

--
Best Regards,
Yan, Zi

Download attachment "signature.asc" of type "application/pgp-signature" (855 bytes)