[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <43350F6E-E56D-41C8-8441-A96D83B7D7C1@nvidia.com>
Date: Fri, 13 Oct 2023 10:51:24 -0400
From: Zi Yan <ziy@...dia.com>
To: Johannes Weiner <hannes@...xchg.org>
Cc: David Hildenbrand <david@...hat.com>,
Vlastimil Babka <vbabka@...e.cz>,
Mike Kravetz <mike.kravetz@...cle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Mel Gorman <mgorman@...hsingularity.net>,
Miaohe Lin <linmiaohe@...wei.com>,
Kefeng Wang <wangkefeng.wang@...wei.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene
On 12 Oct 2023, at 20:06, Zi Yan wrote:
> On 10 Oct 2023, at 17:12, Johannes Weiner wrote:
>
>> Hello!
>>
>> On Mon, Oct 02, 2023 at 10:26:44PM -0400, Zi Yan wrote:
>>> On 27 Sep 2023, at 22:51, Zi Yan wrote:
>>> I attached my revised patch 2 and 3 (with all the suggestions above).
>>
>> Thanks! It took me a bit to read through them. It's a tricky codebase!
>>
>> Some comments below.
>>
>>> From 1c8f99cff5f469ee89adc33e9c9499254cad13f2 Mon Sep 17 00:00:00 2001
>>> From: Zi Yan <ziy@...dia.com>
>>> Date: Mon, 25 Sep 2023 16:27:14 -0400
>>> Subject: [PATCH v2 1/2] mm: set migratetype after free pages are moved between
>>> free lists.
>>>
>>> This avoids changing migratetype after move_freepages() or
>>> move_freepages_block(), which is error prone. It also prepares for upcoming
>>> changes to fix move_freepages() not moving free pages partially in the
>>> range.
>>>
>>> Signed-off-by: Zi Yan <ziy@...dia.com>
>>
>> This is great and indeed makes the callsites much simpler. Thanks,
>> I'll fold this into the series.
>>
>>> @@ -1597,9 +1615,29 @@ static int move_freepages(struct zone *zone, unsigned long start_pfn,
>>> unsigned long end_pfn, int old_mt, int new_mt)
>>> {
>>> struct page *page;
>>> - unsigned long pfn;
>>> + unsigned long pfn, pfn2;
>>> unsigned int order;
>>> int pages_moved = 0;
>>> + unsigned long mt_changed_pfn = start_pfn - pageblock_nr_pages;
>>> + unsigned long new_start_pfn = get_freepage_start_pfn(start_pfn);
>>> +
>>> + /* split at start_pfn if it is in the middle of a free page */
>>> + if (new_start_pfn != start_pfn && PageBuddy(pfn_to_page(new_start_pfn))) {
>>> + struct page *new_page = pfn_to_page(new_start_pfn);
>>> + int new_page_order = buddy_order(new_page);
>>
>> get_freepage_start_pfn() returns start_pfn if it didn't find a large
>> buddy, so the buddy check shouldn't be necessary, right?
>>
>>> + if (new_start_pfn + (1 << new_page_order) > start_pfn) {
>>
>> This *should* be implied according to the comments on
>> get_freepage_start_pfn(), but it currently isn't. Doing so would help
>> here, and seemingly also in alloc_contig_range().
>>
>> How about this version of get_freepage_start_pfn()?
>>
>> /*
>> * Scan the range before this pfn for a buddy that straddles it
>> */
>> static unsigned long find_straddling_buddy(unsigned long start_pfn)
>> {
>> int order = 0;
>> struct page *page;
>> unsigned long pfn = start_pfn;
>>
>> while (!PageBuddy(page = pfn_to_page(pfn))) {
>> /* Nothing found */
>> if (++order > MAX_ORDER)
>> return start_pfn;
>> pfn &= ~0UL << order;
>> }
>>
>> /*
>> * Found a preceding buddy, but does it straddle?
>> */
>> if (pfn + (1 << buddy_order(page)) > start_pfn)
>> return pfn;
>>
>> /* Nothing found */
>> return start_pfn;
>> }
>>
>>> @@ -1614,10 +1652,43 @@ static int move_freepages(struct zone *zone, unsigned long start_pfn,
>>>
>>> order = buddy_order(page);
>>> move_to_free_list(page, zone, order, old_mt, new_mt);
>>> + /*
>>> + * set page migratetype 1) only after we move all free pages in
>>> + * one pageblock and 2) for all pageblocks within the page.
>>> + *
>>> + * for 1), since move_to_free_list() checks page migratetype with
>>> + * old_mt and changing one page migratetype affects all pages
>>> + * within the same pageblock, if we are moving more than
>>> + * one free pages in the same pageblock, setting migratetype
>>> + * right after first move_to_free_list() triggers the warning
>>> + * in the following move_to_free_list().
>>> + *
>>> + * for 2), when a free page order is greater than pageblock_order,
>>> + * all pageblocks within the free page need to be changed after
>>> + * move_to_free_list().
>>
>> I think this can be somewhat simplified.
>>
>> There are two assumptions we can make. Buddies always consist of 2^n
>> pages. And buddies and pageblocks are naturally aligned. This means
>> that if this pageblock has the start of a buddy that straddles into
>> the next pageblock(s), it must be the first page in the block. That in
>> turn means we can move the handling before the loop.
>>
>> If we split first, it also makes the loop a little simpler because we
>> know that any buddies that start inside this block cannot extend
>> beyond it (due to the alignment). The loop how it was originally
>> written can remain untouched.
>>
>>> + */
>>> + if (pfn + (1 << order) > pageblock_end_pfn(pfn)) {
>>> + for (pfn2 = pfn;
>>> + pfn2 < min_t(unsigned long,
>>> + pfn + (1 << order),
>>> + end_pfn + 1);
>>> + pfn2 += pageblock_nr_pages) {
>>> + set_pageblock_migratetype(pfn_to_page(pfn2),
>>> + new_mt);
>>> + mt_changed_pfn = pfn2;
>>
>> Hm, this seems to assume that start_pfn to end_pfn can be more than
>> one block. Why is that? This function is only used on single blocks.
>
> You are right. I made unnecessary assumptions when I wrote the code.
>
>>
>>> + }
>>> + /* split the free page if it goes beyond the specified range */
>>> + if (pfn + (1 << order) > (end_pfn + 1))
>>> + split_free_page(page, order, end_pfn + 1 - pfn);
>>> + }
>>> pfn += 1 << order;
>>> pages_moved += 1 << order;
>>> }
>>> - set_pageblock_migratetype(pfn_to_page(start_pfn), new_mt);
>>> + /* set migratetype for the remaining pageblocks */
>>> + for (pfn2 = mt_changed_pfn + pageblock_nr_pages;
>>> + pfn2 <= end_pfn;
>>> + pfn2 += pageblock_nr_pages)
>>> + set_pageblock_migratetype(pfn_to_page(pfn2), new_mt);
>>
>> If I rework the code on the above, I'm arriving at the following:
>>
>> static int move_freepages(struct zone *zone, unsigned long start_pfn,
>> unsigned long end_pfn, int old_mt, int new_mt)
>> {
>> struct page *start_page = pfn_to_page(start_pfn);
>> int pages_moved = 0;
>> unsigned long pfn;
>>
>> VM_WARN_ON(start_pfn & (pageblock_nr_pages - 1));
>> VM_WARN_ON(start_pfn + pageblock_nr_pages - 1 != end_pfn);
>>
>> /*
>> * A free page may be comprised of 2^n blocks, which means our
>> * block of interest could be head or tail in such a page.
>> *
>> * If we're a tail, update the type of our block, then split
>> * the page into pageblocks. The splitting will do the leg
>> * work of sorting the blocks into the right freelists.
>> *
>> * If we're a head, split the page into pageblocks first. This
>> * ensures the migratetypes still match up during the freelist
>> * removal. Then do the regular scan for buddies in the block
>> * of interest, which will handle the rest.
>> *
>> * In theory, we could try to preserve 2^1 and larger blocks
>> * that lie outside our range. In practice, MAX_ORDER is
>> * usually one or two pageblocks anyway, so don't bother.
>> *
>> * Note that this only applies to page isolation, which calls
>> * this on random blocks in the pfn range! When we move stuff
>> * from inside the page allocator, the pages are coming off
>> * the freelist (can't be tail) and multi-block pages are
>> * handled directly in the stealing code (can't be a head).
>> */
>>
>> /* We're a tail */
>> pfn = find_straddling_buddy(start_pfn);
>> if (pfn != start_pfn) {
>> struct page *free_page = pfn_to_page(pfn);
>>
>> set_pageblock_migratetype(start_page, new_mt);
>> split_free_page(free_page, buddy_order(free_page),
>> pageblock_nr_pages);
>> return pageblock_nr_pages;
>> }
>>
>> /* We're a head */
>> if (PageBuddy(start_page) && buddy_order(start_page) > pageblock_order)
>> split_free_page(start_page, buddy_order(start_page),
>> pageblock_nr_pages);
>
> This actually can be:
>
> /* We're a head */
> if (PageBuddy(start_page) && buddy_order(start_page) > pageblock_order) {
> set_pageblock_migratetype(start_page, new_mt);
> split_free_page(start_page, buddy_order(start_page),
> pageblock_nr_pages);
> return pageblock_nr_pages;
> }
>
>
>>
>> /* Move buddies within the block */
>> while (pfn <= end_pfn) {
>> struct page *page = pfn_to_page(pfn);
>> int order, nr_pages;
>>
>> if (!PageBuddy(page)) {
>> pfn++;
>> continue;
>> }
>>
>> /* Make sure we are not inadvertently changing nodes */
>> VM_BUG_ON_PAGE(page_to_nid(page) != zone_to_nid(zone), page);
>> VM_BUG_ON_PAGE(page_zone(page) != zone, page);
>>
>> order = buddy_order(page);
>> nr_pages = 1 << order;
>>
>> move_to_free_list(page, zone, order, old_mt, new_mt);
>>
>> pfn += nr_pages;
>> pages_moved += nr_pages;
>> }
>>
>> set_pageblock_migratetype(start_page, new_mt);
>>
>> return pages_moved;
>> }
>>
>> Does this look reasonable to you?
>
> Looks good to me. Thanks.
>
>>
>> Note that the page isolation specific stuff comes first. If this code
>> holds up, we should be able to move it to page-isolation.c and keep it
>> out of the regular allocator path.
>
> You mean move the tail and head part to set_migratetype_isolate()?
> And change move_freepages_block() to separate prep_move_freepages_block(),
> the tail and head code, and move_freepages()? It should work and looks
> like a similar code pattern as steal_suitable_fallback().
The attached patch has all the suggested changes, let me know how it
looks to you. Thanks.
--
Best Regards,
Yan, Zi
View attachment "0001-mm-page_isolation-split-cross-pageblock-free-pages-d.patch" of type "text/plain" (11981 bytes)
Download attachment "signature.asc" of type "application/pgp-signature" (855 bytes)
Powered by blists - more mailing lists