linux-kernel - Re: [PATCH v2 07/11] hugetlb: perform vmemmap restoration on a list of pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <BFE00D00-1DFF-4AF8-BB10-5B76BF06E012@linux.dev>
Date:   Mon, 11 Sep 2023 11:10:18 +0800
From:   Muchun Song <muchun.song@...ux.dev>
To:     Mike Kravetz <mike.kravetz@...cle.com>
Cc:     Linux-MM <linux-mm@...ck.org>, LKML <linux-kernel@...r.kernel.org>,
        Muchun Song <songmuchun@...edance.com>,
        Joao Martins <joao.m.martins@...cle.com>,
        Oscar Salvador <osalvador@...e.de>,
        David Hildenbrand <david@...hat.com>,
        Miaohe Lin <linmiaohe@...wei.com>,
        David Rientjes <rientjes@...gle.com>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Naoya Horiguchi <naoya.horiguchi@...ux.dev>,
        Barry Song <song.bao.hua@...ilicon.com>,
        Michal Hocko <mhocko@...e.com>,
        Matthew Wilcox <willy@...radead.org>,
        Xiongchun Duan <duanxiongchun@...edance.com>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH v2 07/11] hugetlb: perform vmemmap restoration on a list
 of pages



> On Sep 9, 2023, at 04:53, Mike Kravetz <mike.kravetz@...cle.com> wrote:
> 
> On 09/07/23 11:54, Mike Kravetz wrote:
>> On 09/07/23 11:33, Muchun Song wrote:
>>> 
>>> 
>>>> On Sep 7, 2023, at 05:12, Mike Kravetz <mike.kravetz@...cle.com> wrote:
>>>> 
>>>> On 09/06/23 16:07, Muchun Song wrote:
>>>>>> On Sep 6, 2023, at 15:33, Muchun Song <muchun.song@...ux.dev> wrote:
>>>>>> On 2023/9/6 05:44, Mike Kravetz wrote:
>>>>>>> When removing hugetlb pages from the pool, we first create a list
>>>>>>> of removed pages and then free those pages back to low level allocators.
>>>>>>> Part of the 'freeing process' is to restore vmemmap for all base pages
>>>>>>> if necessary.  Pass this list of pages to a new routine
>>>>>>> hugetlb_vmemmap_restore_folios() so that vmemmap restoration can be
>>>>>>> performed in bulk.
>>>>>>> 
>>>>>>> Signed-off-by: Mike Kravetz <mike.kravetz@...cle.com>
>>>>>>> ---
>>>>>>> mm/hugetlb.c         |  3 +++
>>>>>>> mm/hugetlb_vmemmap.c | 13 +++++++++++++
>>>>>>> mm/hugetlb_vmemmap.h |  5 +++++
>>>>>>> 3 files changed, 21 insertions(+)
>>>>>>> 
>>>>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>>>>>> index 554be94b07bd..dd2dbc256172 100644
>>>>>>> --- a/mm/hugetlb.c
>>>>>>> +++ b/mm/hugetlb.c
>>>>>>> @@ -1838,6 +1838,9 @@ static void update_and_free_pages_bulk(struct hstate *h, struct list_head *list)
>>>>>>> {
>>>>>>> struct folio *folio, *t_folio;
>>>>>>> + /* First restore vmemmap for all pages on list. */
>>>>>>> + hugetlb_vmemmap_restore_folios(h, list);
>>>>>>> +
>>>>>>> list_for_each_entry_safe(folio, t_folio, list, lru) {
>>>>>>> update_and_free_hugetlb_folio(h, folio, false);
>>>>>>> cond_resched();
>>>>>>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
>>>>>>> index ac5577d372fe..79de984919ef 100644
>>>>>>> --- a/mm/hugetlb_vmemmap.c
>>>>>>> +++ b/mm/hugetlb_vmemmap.c
>>>>>>> @@ -481,6 +481,19 @@ int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head)
>>>>>>> return ret;
>>>>>>> }
>>>>>>> +/*
>>>>>>> + * This function will attempt to resore vmemmap for a list of folios.  There
>>>>>>> + * is no guarantee that restoration will be successful for all or any folios.
>>>>>>> + * This is used in bulk operations, and no feedback is given to the caller.
>>>>>>> + */
>>>>>>> +void hugetlb_vmemmap_restore_folios(const struct hstate *h, struct list_head *folio_list)
>>>>>>> +{
>>>>>>> + struct folio *folio;
>>>>>>> +
>>>>>>> + list_for_each_entry(folio, folio_list, lru)
>>>>>>> + (void)hugetlb_vmemmap_restore(h, &folio->page);
>>>>>> 
>>>>>> I am curious about the purpose of "void" here, seems it it not necessnary,
>>>>>> ritgh? We cound see so many palces where we do not add the void if the caller
>>>>>> does not care about the return value of the callee.
>>>>> 
>>>>> Another question: should we stop restoring vmemmap pages when
>>>>> hugetlb_vmemmap_restore() fails? In which case, I suspect there
>>>>> is no memory probably, there is no need to continue, right?
>>>> 
>>>> Recall that the list of hugetlb pages may be from multiple nodes.  My first
>>>> thought was that we should continue because memory allocation may fail on one
>>>> node but succeed on another.  However, with
>>>> https://lore.kernel.org/linux-mm/20230905031312.91929-1-yuancan@huawei.com/
>>>> memory allocation should fall back to other nodes.  So, yes I do believe it
>>>> would make sense to stop when hugetlb_vmemmap_restore returns ENOMEM as
>>>> we are unlikely to make forward progress.
>>> 
>>> Agree.
>>> 
>>>> 
>>>> Today's behavior will try to restore vmemmap for all pages.  No stopping
>>>> on error.
>>>> 
>>>> I have mixed thoughts on this.  Quitting on error 'seems reasonable'.
>>>> However, if we continue we 'might' be able to allocate vmemmap for one
>>>> hugetlb page.  And, if we free one hugetlb page that should provide
>>>> vmemmap for several more and we may be able to free most pages on the
>>>> list.
>>> 
>>> Yes. A good point. But there should be a non-optimized huge page been
>>> freed somewhere in parallel, otherwise we still cannot allocate memory.
>> 
>> It does not have to be another huge page being freed in parallel.  It
>> could be that when allocating vmemmap for a 1G hugetlb page we were one
>> (4K) page short of what was required.  If someone else frees a 4K page,
>> freeing the next 1G page may succeed.

Right. I missed that.

>> -- 
>> Mike Kravetz
>> 
>>> However, the freeing operation happens after hugetlb_vmemmap_restore_folios.
>>> If we want to handle this, we should rework update_and_free_pages_bulk()
>>> to do a try when at least a huge pages is freed.
> 
> This seemed familiar.  Recall this patch which Muchun Reviewed and James Acked,
> https://lore.kernel.org/linux-mm/20230718004942.113174-3-mike.kravetz@oracle.com/
> 
> If we can not restore vmemmap for a page, then it must be turned into a
> surplus huge page.  In this patch (not the previous one referenced), we
> will try to restore vmemmap one more time in a later call to
> update_and_free_hugetlb_folio.  Certainly, we do not want to try twice!
> 
> My 'plan' is to include the previous patch as part of this series.  With
> that patch in place, the list_for_each_entry calls to hugetlb_vmemmap_restore
> can be replaced with a call to hugetlb_vmemmap_restore_folios.  We would
> change the behavior of hugetlb_vmemmap_restore_folios to return an error
> instead of being of type void.  If an error is returned, then we will
> make another pass through the list looking for unoptimized pages and add
> them as surplus.
> 
> I think it best if we try to restore vmemmap at least once before
> converting to a surplus page.

Make sense.

Muchun

> -- 
> Mike Kravetz