linux-kernel - Re: [PATCH] replace free hugepage folios after migration

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f877d493-8d06-43da-a4cb-f056d60dd921@126.com>
Date: Sun, 22 Dec 2024 19:50:45 +0800
From: Ge Yang <yangge1116@....com>
To: David Hildenbrand <david@...hat.com>, akpm@...ux-foundation.org
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org, stable@...r.kernel.org,
 21cnbao@...il.com, baolin.wang@...ux.alibaba.com, muchun.song@...ux.dev,
 liuzixing@...on.cn, Oscar Salvador <osalvador@...e.de>,
 Michal Hocko <mhocko@...nel.org>
Subject: Re: [PATCH] replace free hugepage folios after migration



在 2024/12/21 22:32, David Hildenbrand 写道:
> On 21.12.24 13:04, Ge Yang wrote:
>>
>>
>> 在 2024/12/21 0:30, David Hildenbrand 写道:
>>> On 20.12.24 09:56, Ge Yang wrote:
>>>>
>>>>
>>>> 在 2024/12/20 0:40, David Hildenbrand 写道:
>>>>> On 18.12.24 07:33, yangge1116@....com wrote:
>>>>>> From: yangge <yangge1116@....com>
>>>>>
>>>>> CCing Oscar, who worked on migrating these pages during memory 
>>>>> offlining
>>>>> and alloc_contig_range().
>>>>>
>>>>>>
>>>>>> My machine has 4 NUMA nodes, each equipped with 32GB of memory. I
>>>>>> have configured each NUMA node with 16GB of CMA and 16GB of in-use
>>>>>> hugetlb pages. The allocation of contiguous memory via the
>>>>>> cma_alloc() function can fail probabilistically.
>>>>>>
>>>>>> The cma_alloc() function may fail if it sees an in-use hugetlb page
>>>>>> within the allocation range, even if that page has already been
>>>>>> migrated. When in-use hugetlb pages are migrated, they may simply
>>>>>> be released back into the free hugepage pool instead of being
>>>>>> returned to the buddy system. This can cause the
>>>>>> test_pages_isolated() function check to fail, ultimately leading
>>>>>> to the failure of the cma_alloc() function:
>>>>>> cma_alloc()
>>>>>>        __alloc_contig_migrate_range() // migrate in-use hugepage
>>>>>>        test_pages_isolated()
>>>>>>            __test_page_isolated_in_pageblock()
>>>>>>                 PageBuddy(page) // check if the page is in buddy
>>>>>
>>>>> I thought this would be working as expected, at least we tested it 
>>>>> with
>>>>> alloc_contig_range / virtio-mem a while ago.
>>>>>
>>>>> On the memory_offlining path, we migrate hugetlb folios, but also
>>>>> dissolve any remaining free folios even if it means that we will going
>>>>> below the requested number of hugetlb pages in our pool.
>>>>>
>>>>> During alloc_contig_range(), we only migrate them, to then free 
>>>>> them up
>>>>> after migration.
>>>>>
>>>>> Under which circumstances doe sit apply that "they may simply be
>>>>> released back into the free hugepage pool instead of being returned to
>>>>> the buddy system"?
>>>>>
>>>>
>>>> After migration, in-use hugetlb pages are only released back to the
>>>> hugetlb pool and are not returned to the buddy system.
>>>
>>> We had
>>>
>>> commit ae37c7ff79f1f030e28ec76c46ee032f8fd07607
>>> Author: Oscar Salvador <osalvador@...e.de>
>>> Date:   Tue May 4 18:35:29 2021 -0700
>>>
>>>       mm: make alloc_contig_range handle in-use hugetlb pages
>>>       alloc_contig_range() will fail if it finds a HugeTLB page 
>>> within the
>>>       range, without a chance to handle them.  Since HugeTLB pages 
>>> can be
>>>       migrated as any LRU or Movable page, it does not make sense to 
>>> bail
>>> out
>>>       without trying.  Enable the interface to recognize in-use HugeTLB
>>> pages so
>>>       we can migrate them, and have much better chances to succeed 
>>> the call.
>>>
>>>
>>> And I am trying to figure out if it never worked correctly, or if
>>> something changed that broke it.
>>>
>>>
>>> In start_isolate_page_range()->isolate_migratepages_block(), we do the
>>>
>>>       ret = isolate_or_dissolve_huge_page(page, &cc->migratepages);
>>>
>>> to add these folios to the cc->migratepages list.
>>>
>>> In __alloc_contig_migrate_range(), we migrate the pages using
>>> migrate_pages().
>>>
>>>
>>> After that, the src hugetlb folios should still be isolated?
>> Yes.
>>
>> But I'm
>>> getting
>>> confused when these pages get un-silated and putback to hugetlb/freed.
>>>
>> If the migration is successful, call folio_putback_active_hugetlb to
>> release the src hugetlb folios back to the free hugetlb pool.
>>
>> trace:
>> unmap_and_move_huge_page
>>       folio_putback_active_hugetlb
>>           folio_put
>>               free_huge_folio
>>
>> alloc_contig_range_noprof
>>       __alloc_contig_migrate_range
>>       if (test_pages_isolated())  //to determine if hugetlb pages in 
>> buddy
>>           isolate_freepages_range //grab isolated pages from freelists.
>>       else
>>           undo_isolate_page_range //undo isolate
> 
> Ah, now I remember, thanks.
> 
> So when we free an ordinary page, we put it onto the buddy isolate list, 
> from where we can grab it later and nobody can allocate it in the meantime.
> 
> In case of hugetlb, we simply free it back to hugetlb, from where it can 
> likely even get allocated immediately again.
> 
> I think that can actually happen in your proposal: the now-free page 
> will get reallocated, for example for migrating the next folio. Or some 
> concurrent system activity can simply allocate the now-free folio. Or am 
> I missing something that prevents these now-free hugetlb folios from 
> getting re-allocated after migration succeeded?
> 
> 
> Conceptually, I think we would want migration code in the case of 
> alloc_contig_range() to allocate a new folio from the buddy, and to free 
> the old one back to the buddy immediately, without ever allowing re- 
> allocation of it.
> 
> What needs to be handled is detecting that
> 
> (a) we want to allocate a fresh hugetlb folio as migration target
> (b) if migration succeeds, we have to free the hugetlb folio back to the 
> buddy
> (c) if migation fails, we have to free the allocated hugetlb foliio back 
> to the buddy
> 
> 
> We could provide a custom alloc_migration_target that we pass to 
> migrate_page to allocate a fresh hugetlb folio to handle (a). Using the 
> put_new_folio callback we could handle (c). (b) would need some thought.
It seems that if we allocate a fresh hugetlb folio as the migration 
target, the source hugetlb folio will be automatically released back to 
the buddy system.

> 
> Maybe we can also just mark the source folio as we isolate it, and 
> enlighten migration+freeing code to handle it automatically?
Can we determine whether a hugetlb page is isolated when allocating it 
from the free hugetlb pool?

dequeue_hugetlb_folio_node_exact() {
     list_for_each_entry(folio, &h->hugepage_freelists[nid], lru) {
         if (is_migrate_isolate_page(folio)) {  //determine whether a 
hugetlb page is isolated
              continue;
         }
     }
}

> 
> Hoping to get some feedback from hugetlb maintainers.
>