[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <393367d3-eb69-4255-a7aa-dcbcbe7a403e@redhat.com>
Date: Wed, 20 Nov 2024 17:13:19 +0100
From: David Hildenbrand <david@...hat.com>
To: Zi Yan <ziy@...dia.com>
Cc: Yu Zhao <yuzhao@...gle.com>, Andrew Morton <akpm@...ux-foundation.org>,
Muchun Song <muchun.song@...ux.dev>,
"Matthew Wilcox (Oracle)" <willy@...radead.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH mm-unstable v2 1/3] mm/contig_alloc: support __GFP_COMP
On 20.11.24 16:55, Zi Yan wrote:
> On 19 Nov 2024, at 11:52, David Hildenbrand wrote:
>
>> On 19.11.24 17:49, David Hildenbrand wrote:
>>> On 19.11.24 17:41, Zi Yan wrote:
>>>> On 19 Nov 2024, at 11:31, David Hildenbrand wrote:
>>>>
>>>>> On 19.11.24 17:12, Zi Yan wrote:
>>>>>> On 19 Nov 2024, at 10:29, David Hildenbrand wrote:
>>>>>>
>>>>>>>> +/* Split a multi-block free page into its individual pageblocks. */
>>>>>>>> +static void split_large_buddy(struct zone *zone, struct page *page,
>>>>>>>> + unsigned long pfn, int order, fpi_t fpi)
>>>>>>>> +{
>>>>>>>> + unsigned long end = pfn + (1 << order);
>>>>>>>> +
>>>>>>>> + VM_WARN_ON_ONCE(!IS_ALIGNED(pfn, 1 << order));
>>>>>>>> + /* Caller removed page from freelist, buddy info cleared! */
>>>>>>>> + VM_WARN_ON_ONCE(PageBuddy(page));
>>>>>>>> +
>>>>>>>> + if (order > pageblock_order)
>>>>>>>> + order = pageblock_order;
>>>>>>>> +
>>>>>>>> + while (pfn != end) {
>>>>>>>> + int mt = get_pfnblock_migratetype(page, pfn);
>>>>>>>> +
>>>>>>>> + __free_one_page(page, pfn, zone, order, mt, fpi);
>>>>>>>> + pfn += 1 << order;
>>>>>>>> + page = pfn_to_page(pfn);
>>>>>>>> + }
>>>>>>>> +}
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> stumbling over this while digging through the code ....
>>>>>>>
>>>>>>>> +
>>>>>>>> static void free_one_page(struct zone *zone, struct page *page,
>>>>>>>> unsigned long pfn, unsigned int order,
>>>>>>>> fpi_t fpi_flags)
>>>>>>>> {
>>>>>>>> unsigned long flags;
>>>>>>>> - int migratetype;
>>>>>>>> spin_lock_irqsave(&zone->lock, flags);
>>>>>>>> - migratetype = get_pfnblock_migratetype(page, pfn);
>>>>>>>> - __free_one_page(page, pfn, zone, order, migratetype, fpi_flags);
>>>>>>>
>>>>>>> This change is rather undesired:
>>>>>>>
>>>>>>> via __free_pages_core()->__free_pages_ok() we can easily end up here with order=MAX_PAGE_ORDER.
>>>>>>
>>>>>> Do you have a concrete example? PMD THP on x86_64 is pageblock_order.
>>>>>> We do not have PMD level mTHP yet. Any other possible source?
>>>>>>
>>>>>>>
>>>>>>> What your new code will do is split this perfectly reasonable MAX_PAGE_ORDER chunk via split_large_buddy() into pageblock-sized chunks, and let the buddy merging logic undo our unnecessary splitting.
>>>>>>>
>>>>>>> Is there a way to avoid this and just process the whole MAX_PAGE_ORDER chunk like we used to?
>>>>>>
>>>>>> Probably split_large_buddy() can check the migratetypes of the to-be-freed
>>>>>> page, if order > pageblock_order. If all migratetypes are the same, the page
>>>>>> can be freed at MAX_PAGE_ORDER, otherwise pageblock_order.
>>>>>
>>>>> Thinking about this: why do we care about the migratetype?
>>>>>
>>>>> We only have to fallback to pageblocks if any pageblock is "MIGRATE_ISOLATE" (and maybe MIGRATE_CMA), but not all. Otherwise, we can just ignore the migratetype (or rather overwrite it)
>>>>
>>>> There are VM_WARN_ONCEs around *_free_list() operations to make sure
>>>> page migratetype matches the migratetype of the list it is on. Ignoring
>>>> migratetype would trigger these WARNs. Overwriting it would work but
>>>> free page migratetype accounting needs to be taken care of.
>>>>
>>>> An implicit reason is that __free_one_page() does not support >MAX_PAGE_ORDER
>>>> and gigantic hugetlb folios are freed via free_one_page(), where
>>>> split_large_buddy() is used to work around the limitation.
>>>
>>> Yes, I saw that change. But that can be easily identified cased by
>>> unlikely(order > MAX_PAGE_ORDER) to handle that rare case in a special way.
>>>
>>> > > For the two memory init cases you mentioned in the other email,
>>> maybe a new
>>>> fpi flag to make free_one_page() use __free_one_page() for them,
>>>> since migratetypes should be the same across MAX_PAGE_ORDER range there, right?
>>>
>>> In the context of alloc_frozen_range()/free_frozen_range() I want to
>>> free MAX_PAGE_ORDER chunks when possible, and not have some odd logic in
>>> the freeing path undo some of that effort.
>>
>> Adding a pointer to that discussion:
>>
>> https://lkml.kernel.org/r/ZzZdnuZBf-xgiwD2@casper.infradead.org
>
> Thanks.
>
> So you are thinking about something like this:
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b6958333054d..3d3341dc1ad1 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1254,7 +1254,12 @@ static void free_one_page(struct zone *zone, struct page *page,
> unsigned long flags;
>
> spin_lock_irqsave(&zone->lock, flags);
> - split_large_buddy(zone, page, pfn, order, fpi_flags);
> + if (unlikely(order > MAX_PAGE_ORDER))
> + split_large_buddy(zone, page, pfn, order, fpi_flags);> + else {
> + int migratetype = get_pfnblock_migratetype(page, pfn);
> + __free_one_page(page, pfn, zone, order, migratetype, fpi_flags);
> + }
> spin_unlock_irqrestore(&zone->lock, flags);
>
> __count_vm_events(PGFREE, 1 << order);
>
Something in that direction, but likely something like:
if (likely(order <= pageblock_order)) {
int migratetype = get_pfnblock_migratetype(page, pfn);
__free_one_page(page, pfn, zone, order, migratetype, fpi_flags);
} else if (order <= MAX_PAGE_ORDER) {
/* We might have to split when some pageblocks differ. */
maybe_split_large_buddy(zone, page, pfn, order, fpi_flags);
} else {
/* The buddy works in max MAX_PAGE_ORDER chunks. */
for /* each MAX_PAGE_ORDER chunk */
maybe_split_large_buddy(zone, page, pfn, MAX_PAGE_ORDER, fpi_flags);
pfn += MAX_ORDER_NR_PAGES;
page = ...
}
}
Whereby maybe_split_large_buddy would check if it has to do anything with the
pageblocks, or whether it can just call __free_one_page(order). It would only
accept order <=
In the future with free_frozen_pages(), the last else case would vanish.
>
> Is it possible to have a MAX_PAGE_ORDER hugetlb folio? If no, we are good.
Hm, I wouldn't rule it out. To be future proof, we'd likely should add a
way to handle it. Shouldn't be too hard and expensive: simply read all
pageblock migratetypes.
I wonder if there are configs (no page isolation?) where we don't have to
check anything.
> If yes, alloc_contig_range() can change the migratetype of one of half that folio
> and during migration phase, that folio will be freed via __free_one_page()
> and causing migratetype mismatch.
Can you remind me how that happens?
Likely, we isolate a single pageblock, and shortly after we free a larger
page that covers multiple pageblocks? So this could affect other
alloc_contig_range() users as well, I assume?
Thanks!
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists