[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <F6598450-EFE0-4EA9-912B-A727DE1F8185@nvidia.com>
Date: Thu, 19 May 2022 17:35:15 -0400
From: Zi Yan <ziy@...dia.com>
To: Qian Cai <quic_qiancai@...cinc.com>
Cc: David Hildenbrand <david@...hat.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org,
virtualization@...ts.linux-foundation.org,
Vlastimil Babka <vbabka@...e.cz>,
Mel Gorman <mgorman@...hsingularity.net>,
Eric Ren <renzhengeek@...il.com>,
Mike Rapoport <rppt@...nel.org>,
Oscar Salvador <osalvador@...e.de>,
Christophe Leroy <christophe.leroy@...roup.eu>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH v11 0/6] Use pageblock_order for cma and alloc_contig_range alignment.
On 19 May 2022, at 16:57, Qian Cai wrote:
> On Thu, Apr 28, 2022 at 08:39:06AM -0400, Zi Yan wrote:
>> How about the one attached? I can apply it to next-20220428. Let me know
>> if you are using a different branch. Thanks.
>
> Zi, it turns out that the endless loop in isolate_single_pageblock() can
> still be reproduced on today's linux-next tree by running the reproducer a
> few times. With this debug patch applied, it keeps printing the same
> values.
>
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -399,6 +399,8 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
> };
> INIT_LIST_HEAD(&cc.migratepages);
>
> + printk_ratelimited("KK stucked pfn=%lu head_pfn=%lu nr_pages=%lu boundary_pfn=%lu\n", pfn, head_pfn, nr_pages, boundary_pfn);
> ret = __alloc_contig_migrate_range(&cc, head_pfn,
> head_pfn + nr_pages);
>
> isolate_single_pageblock: 179 callbacks suppressed
> KK stucked pfn=2151120384 head_pfn=2151120384 nr_pages=512 boundary_pfn=2151120896
> KK stucked pfn=2151120384 head_pfn=2151120384 nr_pages=512 boundary_pfn=2151120896
> KK stucked pfn=2151120384 head_pfn=2151120384 nr_pages=512 boundary_pfn=2151120896
> KK stucked pfn=2151120384 head_pfn=2151120384 nr_pages=512 boundary_pfn=2151120896
> KK stucked pfn=2151120384 head_pfn=2151120384 nr_pages=512 boundary_pfn=2151120896
> KK stucked pfn=2151120384 head_pfn=2151120384 nr_pages=512 boundary_pfn=2151120896
> KK stucked pfn=2151120384 head_pfn=2151120384 nr_pages=512 boundary_pfn=2151120896
> KK stucked pfn=2151120384 head_pfn=2151120384 nr_pages=512 boundary_pfn=2151120896
> KK stucked pfn=2151120384 head_pfn=2151120384 nr_pages=512 boundary_pfn=2151120896
> KK stucked pfn=2151120384 head_pfn=2151120384 nr_pages=512 boundary_pfn=2151120896
Hi Qian,
Thanks for your testing.
Do you have a complete reproducer? From your printout, it is clear that a 512-page compound
page caused the infinite loop, because the page was not migrated and the code kept
retrying. But __alloc_contig_migrate_range() is supposed to return non-zero to tell the
code the page cannot be migrated and the code will goto failed without retrying. It will be
great you can share what exactly has run after boot, so that I can reproduce locally to
identify what makes __alloc_contig_migrate_range() return 0 without migrating the page.
Can you also try the patch below to see if it fixes the infinite loop?
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index b3f074d1682e..abde1877bbcb 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -417,10 +417,9 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
order = 0;
outer_pfn = pfn;
while (!PageBuddy(pfn_to_page(outer_pfn))) {
- if (++order >= MAX_ORDER) {
- outer_pfn = pfn;
- break;
- }
+ /* abort if the free page cannot be found */
+ if (++order >= MAX_ORDER)
+ goto failed;
outer_pfn &= ~0UL << order;
}
pfn = outer_pfn;
--
Best Regards,
Yan, Zi
Download attachment "signature.asc" of type "application/pgp-signature" (855 bytes)
Powered by blists - more mailing lists