[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <48D48FDC-B8DD-41C9-B56A-EBD7314883AB@nvidia.com>
Date:   Fri, 20 May 2022 09:43:40 -0400
From:   Zi Yan <ziy@...dia.com>
To:     Qian Cai <quic_qiancai@...cinc.com>
Cc:     David Hildenbrand <david@...hat.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org,
        virtualization@...ts.linux-foundation.org,
        Vlastimil Babka <vbabka@...e.cz>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Eric Ren <renzhengeek@...il.com>,
        Mike Rapoport <rppt@...nel.org>,
        Oscar Salvador <osalvador@...e.de>,
        Christophe Leroy <christophe.leroy@...roup.eu>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH v11 0/6] Use pageblock_order for cma and alloc_contig_range alignment.
On 20 May 2022, at 7:30, Qian Cai wrote:
> On Thu, May 19, 2022 at 05:35:15PM -0400, Zi Yan wrote:
>> Do you have a complete reproducer? From your printout, it is clear that a 512-page compound
>> page caused the infinite loop, because the page was not migrated and the code kept
>> retrying. But __alloc_contig_migrate_range() is supposed to return non-zero to tell the
>> code the page cannot be migrated and the code will goto failed without retrying. It will be
>> great you can share what exactly has run after boot, so that I can reproduce locally to
>> identify what makes __alloc_contig_migrate_range() return 0 without migrating the page.
>
> The reproducer is just to run the same script I shared with you previously
> multiple times instead. It is still quite reproducible here as it usually
> happens within a hour.
>
> $ for i in `seq 1 100`; do ./flip_mem.py; done
>
>> Can you also try the patch below to see if it fixes the infinite loop?
>>
>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>> index b3f074d1682e..abde1877bbcb 100644
>> --- a/mm/page_isolation.c
>> +++ b/mm/page_isolation.c
>> @@ -417,10 +417,9 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
>>                                 order = 0;
>>                                 outer_pfn = pfn;
>>                                 while (!PageBuddy(pfn_to_page(outer_pfn))) {
>> -                                       if (++order >= MAX_ORDER) {
>> -                                               outer_pfn = pfn;
>> -                                               break;
>> -                                       }
>> +                                       /* abort if the free page cannot be found */
>> +                                       if (++order >= MAX_ORDER)
>> +                                               goto failed;
>>                                         outer_pfn &= ~0UL << order;
>>                                 }
>>                                 pfn = outer_pfn;
>>
>
> Can you explain a bit how this patch is the right thing to do here? I am a
> little bit worry about shooting into the dark. Otherwise, I'll be running
> the off-by-one part over the weekend to see if that helps.
The code kept retrying to migrate a 512-page compound page, so it seems to me
that __alloc_contig_migrate_range() did not migrate the page but returned
0 every time, otherwise, if (ret) goto failed; would bail out of the loop
already. The original code above assumed a free page can always be found after
__alloc_contig_migrate_range(), so it will retry if no free page is found.
But that assumption is not true from your infinite loop result, the new
code quits retrying when no free page can be found.
I will dig into it deeper to make sure it is the correct fix. I will
update you when I am done.
Thanks.
--
Best Regards,
Yan, Zi
Download attachment "signature.asc" of type "application/pgp-signature" (855 bytes)
Powered by blists - more mailing lists
 
