[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <f877d493-8d06-43da-a4cb-f056d60dd921@126.com>
Date: Sun, 22 Dec 2024 19:50:45 +0800
From: Ge Yang <yangge1116@....com>
To: David Hildenbrand <david@...hat.com>, akpm@...ux-foundation.org
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org, stable@...r.kernel.org,
21cnbao@...il.com, baolin.wang@...ux.alibaba.com, muchun.song@...ux.dev,
liuzixing@...on.cn, Oscar Salvador <osalvador@...e.de>,
Michal Hocko <mhocko@...nel.org>
Subject: Re: [PATCH] replace free hugepage folios after migration
在 2024/12/21 22:32, David Hildenbrand 写道:
> On 21.12.24 13:04, Ge Yang wrote:
>>
>>
>> 在 2024/12/21 0:30, David Hildenbrand 写道:
>>> On 20.12.24 09:56, Ge Yang wrote:
>>>>
>>>>
>>>> 在 2024/12/20 0:40, David Hildenbrand 写道:
>>>>> On 18.12.24 07:33, yangge1116@....com wrote:
>>>>>> From: yangge <yangge1116@....com>
>>>>>
>>>>> CCing Oscar, who worked on migrating these pages during memory
>>>>> offlining
>>>>> and alloc_contig_range().
>>>>>
>>>>>>
>>>>>> My machine has 4 NUMA nodes, each equipped with 32GB of memory. I
>>>>>> have configured each NUMA node with 16GB of CMA and 16GB of in-use
>>>>>> hugetlb pages. The allocation of contiguous memory via the
>>>>>> cma_alloc() function can fail probabilistically.
>>>>>>
>>>>>> The cma_alloc() function may fail if it sees an in-use hugetlb page
>>>>>> within the allocation range, even if that page has already been
>>>>>> migrated. When in-use hugetlb pages are migrated, they may simply
>>>>>> be released back into the free hugepage pool instead of being
>>>>>> returned to the buddy system. This can cause the
>>>>>> test_pages_isolated() function check to fail, ultimately leading
>>>>>> to the failure of the cma_alloc() function:
>>>>>> cma_alloc()
>>>>>> __alloc_contig_migrate_range() // migrate in-use hugepage
>>>>>> test_pages_isolated()
>>>>>> __test_page_isolated_in_pageblock()
>>>>>> PageBuddy(page) // check if the page is in buddy
>>>>>
>>>>> I thought this would be working as expected, at least we tested it
>>>>> with
>>>>> alloc_contig_range / virtio-mem a while ago.
>>>>>
>>>>> On the memory_offlining path, we migrate hugetlb folios, but also
>>>>> dissolve any remaining free folios even if it means that we will going
>>>>> below the requested number of hugetlb pages in our pool.
>>>>>
>>>>> During alloc_contig_range(), we only migrate them, to then free
>>>>> them up
>>>>> after migration.
>>>>>
>>>>> Under which circumstances doe sit apply that "they may simply be
>>>>> released back into the free hugepage pool instead of being returned to
>>>>> the buddy system"?
>>>>>
>>>>
>>>> After migration, in-use hugetlb pages are only released back to the
>>>> hugetlb pool and are not returned to the buddy system.
>>>
>>> We had
>>>
>>> commit ae37c7ff79f1f030e28ec76c46ee032f8fd07607
>>> Author: Oscar Salvador <osalvador@...e.de>
>>> Date: Tue May 4 18:35:29 2021 -0700
>>>
>>> mm: make alloc_contig_range handle in-use hugetlb pages
>>> alloc_contig_range() will fail if it finds a HugeTLB page
>>> within the
>>> range, without a chance to handle them. Since HugeTLB pages
>>> can be
>>> migrated as any LRU or Movable page, it does not make sense to
>>> bail
>>> out
>>> without trying. Enable the interface to recognize in-use HugeTLB
>>> pages so
>>> we can migrate them, and have much better chances to succeed
>>> the call.
>>>
>>>
>>> And I am trying to figure out if it never worked correctly, or if
>>> something changed that broke it.
>>>
>>>
>>> In start_isolate_page_range()->isolate_migratepages_block(), we do the
>>>
>>> ret = isolate_or_dissolve_huge_page(page, &cc->migratepages);
>>>
>>> to add these folios to the cc->migratepages list.
>>>
>>> In __alloc_contig_migrate_range(), we migrate the pages using
>>> migrate_pages().
>>>
>>>
>>> After that, the src hugetlb folios should still be isolated?
>> Yes.
>>
>> But I'm
>>> getting
>>> confused when these pages get un-silated and putback to hugetlb/freed.
>>>
>> If the migration is successful, call folio_putback_active_hugetlb to
>> release the src hugetlb folios back to the free hugetlb pool.
>>
>> trace:
>> unmap_and_move_huge_page
>> folio_putback_active_hugetlb
>> folio_put
>> free_huge_folio
>>
>> alloc_contig_range_noprof
>> __alloc_contig_migrate_range
>> if (test_pages_isolated()) //to determine if hugetlb pages in
>> buddy
>> isolate_freepages_range //grab isolated pages from freelists.
>> else
>> undo_isolate_page_range //undo isolate
>
> Ah, now I remember, thanks.
>
> So when we free an ordinary page, we put it onto the buddy isolate list,
> from where we can grab it later and nobody can allocate it in the meantime.
>
> In case of hugetlb, we simply free it back to hugetlb, from where it can
> likely even get allocated immediately again.
>
> I think that can actually happen in your proposal: the now-free page
> will get reallocated, for example for migrating the next folio. Or some
> concurrent system activity can simply allocate the now-free folio. Or am
> I missing something that prevents these now-free hugetlb folios from
> getting re-allocated after migration succeeded?
>
>
> Conceptually, I think we would want migration code in the case of
> alloc_contig_range() to allocate a new folio from the buddy, and to free
> the old one back to the buddy immediately, without ever allowing re-
> allocation of it.
>
> What needs to be handled is detecting that
>
> (a) we want to allocate a fresh hugetlb folio as migration target
> (b) if migration succeeds, we have to free the hugetlb folio back to the
> buddy
> (c) if migation fails, we have to free the allocated hugetlb foliio back
> to the buddy
>
>
> We could provide a custom alloc_migration_target that we pass to
> migrate_page to allocate a fresh hugetlb folio to handle (a). Using the
> put_new_folio callback we could handle (c). (b) would need some thought.
It seems that if we allocate a fresh hugetlb folio as the migration
target, the source hugetlb folio will be automatically released back to
the buddy system.
>
> Maybe we can also just mark the source folio as we isolate it, and
> enlighten migration+freeing code to handle it automatically?
Can we determine whether a hugetlb page is isolated when allocating it
from the free hugetlb pool?
dequeue_hugetlb_folio_node_exact() {
list_for_each_entry(folio, &h->hugepage_freelists[nid], lru) {
if (is_migrate_isolate_page(folio)) { //determine whether a
hugetlb page is isolated
continue;
}
}
}
>
> Hoping to get some feedback from hugetlb maintainers.
>
Powered by blists - more mailing lists