[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200522070114.GE26955@MiWiFi-R3L-srv>
Date:   Fri, 22 May 2020 15:01:14 +0800
From:   Baoquan He <bhe@...hat.com>
To:     Mike Rapoport <rppt@...ux.ibm.com>, mgorman@...e.de
Cc:     linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        akpm@...ux-foundation.org, cai@....pw, mhocko@...nel.org
Subject: Re: [PATCH] mm/compaction: Fix the incorrect hole in
 fast_isolate_freepages()
On 05/21/20 at 08:18pm, Mike Rapoport wrote:
> On Thu, May 21, 2020 at 11:52:25PM +0800, Baoquan He wrote:
> > On 05/21/20 at 12:26pm, Mike Rapoport wrote:
> > > > For this kind of e820 reserved range, it won't be added to memblock allocator.
> > > > However, init_unavailable_mem() will initialize to add them into node 0,
> > > > zone 0. Before that commit, later, memmap_init() will add e820 reserved
> > > > ranges into the zone where they are contained, because it can pass
> > > > the checking of early_pfn_valid() and early_pfn_in_nid(). In this case,
> > > > the e820 reserved range where fault page 0x2a800000 is located is added
> > > > into DMA32 zone. After that commit, the e820 reserved rgions are kept
> > > > in node 0, zone 0, since we iterate over memblock regions to iniatialize
> > > > in memmap_init() instead, their node and zone won't be changed.
> > > 
> > > I wonder why woudn't we add the reserved memory to memblock from the
> > > very beginning...
> > > I've tried to undestand why this was not done, but I couldn't find the
> > > reasoning behind that.
> > 
> > I have added some explanation when reply to Mel. Please check that in
> > that thread.
> 
> What I meant was that I've tried to find an explanation in the kernel logs
> for why the reserved areas are not added to memblock.memory on x86. I
> didn't mean that the your patch description lacks this explanation :)
Ah, I misunderstood it, sorry about that.
> 
> > As I said, the unavailable range includes firmware reserved ranges, and
> > holes inside one boot memory section, if that boot memory section haves
> > useable memory range, and firmware reserved ranges, and holes. Adding
> > them all into memblock seems a little unreasonable, since they are never
> > used by system in memblock, buddy or high level memory allocator. But I
> > can see that adding them into memblock may have the same effect as the
> > old code which is beofre your your patchset applied. Let's see if Mel or
> > other people have some saying. I pesonally would not suggest doing it
> > like this though.
> 
> Adding reserved regions to memblock.memory will not have the same effect
> as the old code. We anyway have to initialize struct page for these
> areas, but unlike the old code we don't need to run them by the
> early_pfn_in_nid() checks and we still get rid the
> CONFIG_NODES_SPAN_OTHER_NODES option.
Hmm, I mean adding them to memblock will let us have the same result,
they are added into the node, zone where they should be, and marked as
reserved, just as the old code did.
Rethink about this, seems adding them into memblock is doable. But
we may not need to add them from e820 reserved range, since that will
skip hole range which share the same section with usable range, and may
need to change code in different ARCHes. How about this:
We add them into memblock in init_unavailable_range(), memmap_init() will
add them into the right node and zone, reserve_bootmem_region() will
initialize them and mark them as Reserved.
>From d019d0f9e7c958542dfcb142f93d07fcce6c7c22 Mon Sep 17 00:00:00 2001
From: Baoquan He <bhe@...hat.com>
Date: Fri, 22 May 2020 14:36:13 +0800
Subject: [PATCH] mm/page_alloc.c: Add unavailable ranges into memblock
These unavailable ranges shares the same section with the usable range
in boot memory, e.g the firmware reserved ranges, and holes.
Previously, they are added into node 0, zone 0 in function
init_unavailable_range(), and marked as Reserved. Later, in function
memmap_init(), they will be added to appropriate node and zone, where
they are covered.
However, after the patchset ("mm: rework free_area_init*() funcitons")
is applied, we change to iterate over memblock regions. These unavailable
ranges are skipped, and the node and zone adjustment won't be done any
more as the old code did. This cause a crash in compaction which is triggered
by VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn)).
So let's add these unavailable ranges into memblock and reserve them
in init_unavailable_range() instead. With this change, they will be added
into appropriate node and zone in memmap_init(), and initialized in
reserve_bootmem_region() just like any other memblock reserved regions.
Signed-off-by: Baoquan He <bhe@...hat.com>
---
 mm/page_alloc.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 603187800628..3973b5fdfe3f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6925,7 +6925,7 @@ static u64 __init init_unavailable_range(unsigned long spfn, unsigned long epfn)
 static void __init init_unavailable_mem(void)
 {
 	phys_addr_t start, end;
-	u64 i, pgcnt;
+	u64 i, pgcnt, size;
 	phys_addr_t next = 0;
 
 	/*
@@ -6934,9 +6934,11 @@ static void __init init_unavailable_mem(void)
 	pgcnt = 0;
 	for_each_mem_range(i, &memblock.memory, NULL,
 			NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, NULL) {
-		if (next < start)
-			pgcnt += init_unavailable_range(PFN_DOWN(next),
-							PFN_UP(start));
+		if (next < start) {
+			size = PFN_UP(start) - PFN_DOWN(next);
+			memblock_add(PFN_DOWN(next), size);
+			memblock_reserve(PFN_DOWN(next), size);
+		}
 		next = end;
 	}
 
@@ -6947,8 +6949,11 @@ static void __init init_unavailable_mem(void)
 	 * considered initialized. Make sure that memmap has a well defined
 	 * state.
 	 */
-	pgcnt += init_unavailable_range(PFN_DOWN(next),
-					round_up(max_pfn, PAGES_PER_SECTION));
+	size = round_up(max_pfn, PAGES_PER_SECTION) - PFN_DOWN(next);
+	if (size) {
+		memblock_add(PFN_DOWN(next), size);
+		memblock_reserve(PFN_DOWN(next), size);
+	}
 
 	/*
 	 * Struct pages that do not have backing memory. This could be because
-- 
2.17.2
Powered by blists - more mailing lists
 
