[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <16391A0B-4E3C-4F3E-8904-9034DC87DC52@nvidia.com>
Date: Wed, 20 Nov 2024 10:55:09 -0500
From: Zi Yan <ziy@...dia.com>
To: David Hildenbrand <david@...hat.com>
Cc: Yu Zhao <yuzhao@...gle.com>, Andrew Morton <akpm@...ux-foundation.org>,
Muchun Song <muchun.song@...ux.dev>,
"Matthew Wilcox (Oracle)" <willy@...radead.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH mm-unstable v2 1/3] mm/contig_alloc: support __GFP_COMP
On 19 Nov 2024, at 11:52, David Hildenbrand wrote:
> On 19.11.24 17:49, David Hildenbrand wrote:
>> On 19.11.24 17:41, Zi Yan wrote:
>>> On 19 Nov 2024, at 11:31, David Hildenbrand wrote:
>>>
>>>> On 19.11.24 17:12, Zi Yan wrote:
>>>>> On 19 Nov 2024, at 10:29, David Hildenbrand wrote:
>>>>>
>>>>>>> +/* Split a multi-block free page into its individual pageblocks. */
>>>>>>> +static void split_large_buddy(struct zone *zone, struct page *page,
>>>>>>> + unsigned long pfn, int order, fpi_t fpi)
>>>>>>> +{
>>>>>>> + unsigned long end = pfn + (1 << order);
>>>>>>> +
>>>>>>> + VM_WARN_ON_ONCE(!IS_ALIGNED(pfn, 1 << order));
>>>>>>> + /* Caller removed page from freelist, buddy info cleared! */
>>>>>>> + VM_WARN_ON_ONCE(PageBuddy(page));
>>>>>>> +
>>>>>>> + if (order > pageblock_order)
>>>>>>> + order = pageblock_order;
>>>>>>> +
>>>>>>> + while (pfn != end) {
>>>>>>> + int mt = get_pfnblock_migratetype(page, pfn);
>>>>>>> +
>>>>>>> + __free_one_page(page, pfn, zone, order, mt, fpi);
>>>>>>> + pfn += 1 << order;
>>>>>>> + page = pfn_to_page(pfn);
>>>>>>> + }
>>>>>>> +}
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> stumbling over this while digging through the code ....
>>>>>>
>>>>>>> +
>>>>>>> static void free_one_page(struct zone *zone, struct page *page,
>>>>>>> unsigned long pfn, unsigned int order,
>>>>>>> fpi_t fpi_flags)
>>>>>>> {
>>>>>>> unsigned long flags;
>>>>>>> - int migratetype;
>>>>>>> spin_lock_irqsave(&zone->lock, flags);
>>>>>>> - migratetype = get_pfnblock_migratetype(page, pfn);
>>>>>>> - __free_one_page(page, pfn, zone, order, migratetype, fpi_flags);
>>>>>>
>>>>>> This change is rather undesired:
>>>>>>
>>>>>> via __free_pages_core()->__free_pages_ok() we can easily end up here with order=MAX_PAGE_ORDER.
>>>>>
>>>>> Do you have a concrete example? PMD THP on x86_64 is pageblock_order.
>>>>> We do not have PMD level mTHP yet. Any other possible source?
>>>>>
>>>>>>
>>>>>> What your new code will do is split this perfectly reasonable MAX_PAGE_ORDER chunk via split_large_buddy() into pageblock-sized chunks, and let the buddy merging logic undo our unnecessary splitting.
>>>>>>
>>>>>> Is there a way to avoid this and just process the whole MAX_PAGE_ORDER chunk like we used to?
>>>>>
>>>>> Probably split_large_buddy() can check the migratetypes of the to-be-freed
>>>>> page, if order > pageblock_order. If all migratetypes are the same, the page
>>>>> can be freed at MAX_PAGE_ORDER, otherwise pageblock_order.
>>>>
>>>> Thinking about this: why do we care about the migratetype?
>>>>
>>>> We only have to fallback to pageblocks if any pageblock is "MIGRATE_ISOLATE" (and maybe MIGRATE_CMA), but not all. Otherwise, we can just ignore the migratetype (or rather overwrite it)
>>>
>>> There are VM_WARN_ONCEs around *_free_list() operations to make sure
>>> page migratetype matches the migratetype of the list it is on. Ignoring
>>> migratetype would trigger these WARNs. Overwriting it would work but
>>> free page migratetype accounting needs to be taken care of.
>>>
>>> An implicit reason is that __free_one_page() does not support >MAX_PAGE_ORDER
>>> and gigantic hugetlb folios are freed via free_one_page(), where
>>> split_large_buddy() is used to work around the limitation.
>>
>> Yes, I saw that change. But that can be easily identified cased by
>> unlikely(order > MAX_PAGE_ORDER) to handle that rare case in a special way.
>>
>> > > For the two memory init cases you mentioned in the other email,
>> maybe a new
>>> fpi flag to make free_one_page() use __free_one_page() for them,
>>> since migratetypes should be the same across MAX_PAGE_ORDER range there, right?
>>
>> In the context of alloc_frozen_range()/free_frozen_range() I want to
>> free MAX_PAGE_ORDER chunks when possible, and not have some odd logic in
>> the freeing path undo some of that effort.
>
> Adding a pointer to that discussion:
>
> https://lkml.kernel.org/r/ZzZdnuZBf-xgiwD2@casper.infradead.org
Thanks.
So you are thinking about something like this:
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b6958333054d..3d3341dc1ad1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1254,7 +1254,12 @@ static void free_one_page(struct zone *zone, struct page *page,
unsigned long flags;
spin_lock_irqsave(&zone->lock, flags);
- split_large_buddy(zone, page, pfn, order, fpi_flags);
+ if (unlikely(order > MAX_PAGE_ORDER))
+ split_large_buddy(zone, page, pfn, order, fpi_flags);
+ else {
+ int migratetype = get_pfnblock_migratetype(page, pfn);
+ __free_one_page(page, pfn, zone, order, migratetype, fpi_flags);
+ }
spin_unlock_irqrestore(&zone->lock, flags);
__count_vm_events(PGFREE, 1 << order);
Is it possible to have a MAX_PAGE_ORDER hugetlb folio? If no, we are good.
If yes, alloc_contig_range() can change the migratetype of one of half that folio
and during migration phase, that folio will be freed via __free_one_page()
and causing migratetype mismatch.
Best Regards,
Yan, Zi
Powered by blists - more mailing lists