[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ac73d772-c585-1d9e-c8ee-36c51b608906@redhat.com>
Date: Mon, 2 Oct 2023 13:43:30 +0200
From: David Hildenbrand <david@...hat.com>
To: Zi Yan <ziy@...dia.com>
Cc: Johannes Weiner <hannes@...xchg.org>,
Vlastimil Babka <vbabka@...e.cz>,
Mike Kravetz <mike.kravetz@...cle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Mel Gorman <mgorman@...hsingularity.net>,
Miaohe Lin <linmiaohe@...wei.com>,
Kefeng Wang <wangkefeng.wang@...wei.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene
>>> I can do it after I fix this. That change might or might not help only if we make
>>> some redesign on how migratetype is managed. If MIGRATE_ISOLATE does not
>>> overwrite existing migratetype, the code might not need to split a page and move
>>> it to MIGRATE_ISOLATE freelist?
>>
>> Did someone test how memory offlining plays along with that? (I can try myself
>> within the next 1-2 weeks)
>>
>> There [mm/memory_hotplug.c:offline_pages] we always cover full MAX_ORDER ranges,
>> though.
>>
>> ret = start_isolate_page_range(start_pfn, end_pfn,
>> MIGRATE_MOVABLE,
>> MEMORY_OFFLINE | REPORT_FAILURE,
>> GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL);
>
> Since a full MAX_ORDER range is passed, no free page split will happen.
Okay, thanks for verifying that it should not be affected!
>
>>
>>>
>>> The fundamental issue in alloc_contig_range() is that to work at
>>> pageblock level, a page (>pageblock_order) can have one part is isolated and
>>> the rest is a different migratetype. {add_to,move_to,del_page_from}_free_list()
>>> now checks first pageblock migratetype, so such a page needs to be removed
>>> from its free_list, set MIGRATE_ISOLATE on one of the pageblock, split, and
>>> finally put back to multiple free lists. This needs to be done at isolation stage
>>> before free pages are removed from their free lists (the stage after isolation).
>>
>> One idea was to always isolate larger chunks, and handle movability checks/split/etc
>> at a later stage. Once isolation would be decoupled from the actual/original migratetype,
>> the could have been easier to handle (especially some corner cases I had in mind back then).
>
> I think it is a good idea. When I coded alloc_contig_range() up, I tried to
> accommodate existing set_migratetype_isolate(), which calls has_unmovable_pages().
> If these two are decoupled, set_migrateype_isolate() can work on MAX_ORDER-aligned
> ranges and has_unmovable_pages() can still work on pageblock-aligned ranges.
> Let me give this a try.
>
But again, just some thought I had back then, maybe it doesn't help for
anything; I found more time to look into the whole thing in more detail.
>>
>>> If MIGRATE_ISOLATE is a separate flag and we are OK with leaving isolated pages
>>> in their original migratetype and check migratetype before allocating a page,
>>> that might help. But that might add extra work (e.g., splitting a partially
>>> isolated free page before allocation) in the really hot code path, which is not
>>> desirable.
>>
>> With MIGRATE_ISOLATE being a separate flag, one idea was to have not a single
>> separate isolate list, but one per "proper migratetype". But again, just some random
>> thoughts I had back then, I never had sufficient time to think it all through.
>
> Got it. I will think about it.
>
> One question on separate MIGRATE_ISOLATE:
>
> the implementation I have in mind is that MIGRATE_ISOLATE will need a dedicated flag
> bit instead of being one of migratetype. But now there are 5 migratetypes +
Exactly what I was concerned about back then ...
> MIGRATE_ISOLATE and PB_migratetype_bits is 3, so an extra migratetype_bit is needed.
> But current migratetype implementation is a word-based operation, requiring
> NR_PAGEBLOCK_BITS to be divisor of BITS_PER_LONG. This means NR_PAGEBLOCK_BITS
> needs to be increased from 4 to 8 to meet the requirement, wasting a lot of space.
... until I did the math. Let's assume a pageblock is 2 MiB.
4/(2* 1024 * 1024 * 8) = 0,00002384185791016 %
8/(2* 1024 * 1024 * 8) -> 1 / (2* 1024 * 1024) = 0,00004768371582031 %
For a 1 TiB machine that means 256 KiB vs. 512 KiB
I concluded that "wasting a lot of space" is not really the right word
to describe that :)
Just to put it into perspective, the memmap (64/4096) for a 1 TiB
machine is ... 16 GiB.
> An alternative is to have a separate array for MIGRATE_ISOLATE, which requires
> additional changes. Let me know if you have a better idea. Thanks.
It would probably be cleanest to just use one byte per pageblock. That
would cleanup the whole machinery eventually as well.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists