linux-kernel - Re: [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <236FF93B-FDBC-4168-AD9F-28F53CC1B6C0@nvidia.com>
Date:   Mon, 16 Oct 2023 16:48:41 -0400
From:   Zi Yan <ziy@...dia.com>
To:     Johannes Weiner <hannes@...xchg.org>
Cc:     David Hildenbrand <david@...hat.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Kefeng Wang <wangkefeng.wang@...wei.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene

On 16 Oct 2023, at 16:39, Johannes Weiner wrote:

> On Mon, Oct 16, 2023 at 04:26:30PM -0400, Johannes Weiner wrote:
>> On Mon, Oct 16, 2023 at 03:49:49PM -0400, Zi Yan wrote:
>>> On 16 Oct 2023, at 14:51, Johannes Weiner wrote:
>>>
>>>> On Mon, Oct 16, 2023 at 11:00:33AM -0400, Zi Yan wrote:
>>>>> On 16 Oct 2023, at 10:37, Johannes Weiner wrote:
>>>>>
>>>>>> On Mon, Oct 16, 2023 at 09:35:34AM -0400, Zi Yan wrote:
>>>>>>>> The attached patch has all the suggested changes, let me know how it
>>>>>>>> looks to you. Thanks.
>>>>>>>
>>>>>>> The one I sent has free page accounting issues. The attached one fixes them.
>>>>>>
>>>>>> Do you still have the warnings? I wonder what went wrong.
>>>>>
>>>>> No warnings. But something with the code:
>>>>>
>>>>> 1. in your version, split_free_page() is called without changing any pageblock
>>>>> migratetypes, then split_free_page() is just a no-op, since the page is
>>>>> just deleted from the free list, then freed via different orders. Buddy allocator
>>>>> will merge them back.
>>>>
>>>> Hm not quite.
>>>>
>>>> If it's the tail block of a buddy, I update its type before
>>>> splitting. The splitting loop looks up the type of each block for
>>>> sorting it onto freelists.
>>>>
>>>> If it's the head block, yes I split it first according to its old
>>>> type. But then I let it fall through to scanning the block, which will
>>>> find that buddy, update its type and move it.
>>>
>>> That is the issue, since split_free_page() assumes the pageblocks of
>>> that free page have different types. It basically just free the page
>>> with different small orders summed up to the original free page order.
>>> If all pageblocks of the free page have the same migratetype, __free_one_page()
>>> will merge these small order pages back to the original order free page.
>>
>> duh, of course, you're right. Thanks for patiently explaining this.
>>
>>>>> 2. in my version, I set pageblock migratetype to new_mt before split_free_page(),
>>>>> but it causes free page accounting issues, since in the case of head, free pages
>>>>> are deleted from new_mt when they are in old_mt free list and the accounting
>>>>> decreases new_mt free page number instead of old_mt one.
>>>>
>>>> Right, that makes sense.
>>>>
>>>>> Basically, split_free_page() is awkward as it relies on preset migratetypes,
>>>>> which changes migratetypes without deleting the free pages from the list first.
>>>>> That is why I came up with the new split_free_page() below.
>>>>
>>>> Yeah, the in-between thing is bad. Either it fixes the migratetype
>>>> before deletion, or it doesn't do the deletion. I'm thinking it would
>>>> be simpler to move the deletion out instead.
>>>
>>> Yes and no. After deletion, a free page no longer has PageBuddy set and
>>> has buddy_order information cleared. Either we reset PageBuddy and order
>>> to the deleted free page, or split_free_page() needs to be changed to
>>> accept pages without the information (basically remove the PageBuddy
>>> and order check code).
>>
>> Good point, that requires extra care.
>>
>> It's correct in the code now, but it deserves a comment, especially
>> because of the "buddy" naming in the new split function.
>>
>>>>> Hmm, if CONFIG_ARCH_FORCE_MAX_ORDER can make a buddy have more than one
>>>>> pageblock and in turn makes an in-use page have more than one pageblock,
>>>>> we will have problems. Since in isolate_single_pageblock(), an in-use page
>>>>> can have part of its pageblock set to a different migratetype and be freed,
>>>>> causing the free page with unmatched migratetypes. We might need to
>>>>> free pages at pageblock_order if their orders are bigger than pageblock_order.
>>>>
>>>> Is this a practical issue? You mentioned that right now only gigantic
>>>> pages can be larger than a pageblock, and those are freed in order-0
>>>> chunks.
>>>
>>> Only if the system allocates a page (non hugetlb pages) with >pageblock_order
>>> and frees it with the same order. I just do not know if such pages exist on
>>> other arch than x86. Maybe I just think too much.
>>
>> Hm, I removed LRU pages from the handling (and added the warning) but
>> I left in PageMovable(). The only users are z3fold, zsmalloc and
>> memory ballooning. AFAICS none of them can be bigger than a pageblock.
>> Let me remove that and add a warning for that case as well.
>>
>> This way, we only attempt to migrate hugetlb, where we know the free
>> path - and get warnings for anything else that's larger than expected.
>>
>> This seems like the safest option. On the off chance that there is a
>> regression, it won't jeopardize anybody's systems, while the warning
>> provides all the information we need to debug what's going on.
>
> This delta on top?
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b5292ad9860c..0da7c61af37e 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1628,7 +1628,7 @@ static int move_freepages_block(struct zone *zone, struct page *page,
>  }
>
>  #ifdef CONFIG_MEMORY_ISOLATION
> -/* Look for a multi-block buddy that straddles start_pfn */
> +/* Look for a buddy that straddles start_pfn */
>  static unsigned long find_large_buddy(unsigned long start_pfn)
>  {
>  	int order = 0;
> @@ -1652,7 +1652,7 @@ static unsigned long find_large_buddy(unsigned long start_pfn)
>  	return start_pfn;
>  }
>
> -/* Split a multi-block buddy into its individual pageblocks */
> +/* Split a multi-block free page into its individual pageblocks */
>  static void split_large_buddy(struct zone *zone, struct page *page,
>  			      unsigned long pfn, int order)
>  {
> @@ -1661,6 +1661,9 @@ static void split_large_buddy(struct zone *zone, struct page *page,
>  	VM_WARN_ON_ONCE(order < pageblock_order);
>  	VM_WARN_ON_ONCE(pfn & (pageblock_nr_pages - 1));
>
> +	/* Caller removed page from freelist, buddy info cleared! */
> +	VM_WARN_ON_ONCE(PageBuddy(page));
> +
>  	while (pfn != end_pfn) {
>  		int mt = get_pfnblock_migratetype(page, pfn);
>
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index b4d53545496d..c8b3c0699683 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -399,14 +399,8 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
>  				continue;
>  			}
>
> -			VM_WARN_ON_ONCE_PAGE(PageLRU(page), page);
> -
>  #if defined CONFIG_COMPACTION || defined CONFIG_CMA
> -			/*
> -			 * hugetlb, and movable compound pages can be
> -			 * migrated. Otherwise, fail the isolation.
> -			 */
> -			if (PageHuge(page) || __PageMovable(page)) {
> +			if (PageHuge(page)) {
>  				struct compact_control cc = {
>  					.nr_migratepages = 0,
>  					.order = -1,
> @@ -426,9 +420,19 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
>
>  				pfn = head_pfn + nr_pages;
>  				continue;
> -			} else
> +			}
> +
> +			/*
> +			 * These pages are movable too, but they're
> +			 * not expected to exceed pageblock_order.
> +			 *
> +			 * Let us know when they do, so we can add
> +			 * proper free and split handling for them.
> +			 */
> +			VM_WARN_ON_ONCE_PAGE(PageLRU(page), page);
> +			VM_WARN_ON_ONCE_PAGE(__PageMovable(page), page);
>  #endif
> -				goto failed;
> +			goto failed;
>  		}
>
>  		pfn++;

LGTM.

I was thinking about adding

VM_WARN_ON_ONCE(order > pageblock_order, page);

in __free_pages() to catch all possible cases, but that is a really hot path.

And just for the record, we probably can easily fix the above warnings,
if they ever show up, by freeing >pageblock_order pages in unit of
pageblock_order.

--
Best Regards,
Yan, Zi

Download attachment "signature.asc" of type "application/pgp-signature" (855 bytes)