[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201126194426.GU123287@linux.ibm.com>
Date: Thu, 26 Nov 2020 21:44:26 +0200
From: Mike Rapoport <rppt@...ux.ibm.com>
To: Andrea Arcangeli <aarcange@...hat.com>
Cc: David Hildenbrand <david@...hat.com>,
Vlastimil Babka <vbabka@...e.cz>, Mel Gorman <mgorman@...e.de>,
Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
Qian Cai <cai@....pw>, Michal Hocko <mhocko@...nel.org>,
linux-kernel@...r.kernel.org, Baoquan He <bhe@...hat.com>
Subject: Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set
pageblock_skip on reserved pages
On Thu, Nov 26, 2020 at 01:29:30PM -0500, Andrea Arcangeli wrote:
> On Thu, Nov 26, 2020 at 11:36:02AM +0200, Mike Rapoport wrote:
> > memory.reserved cannot be calculated automatically. It represents all
> > the memory allocations made before page allocator is up. And as
> > memblock_reserve() is the most basic to allocate memory early at boot we
> > cannot really delete it ;-)
>
> Well this explanation totally covers "memory allocated early at
> boot" that overlaps with memblock.memory.
>
> Does the E820_TYPE_SOFT_RESERVED range added to memblock.reserve
> define as "memory allocated early at boot"?
>
> Does it overlap ranges added with any RAM added to memblock.memory?
>
> if (entry->type == E820_TYPE_SOFT_RESERVED)
> memblock_reserve(entry->addr, entry->size);
>
> if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN)
> continue;
>
> memblock_add(entry->addr, entry->size);
>
> To me the above looks it's being used for something completely
> different than from reserving "memory allocated early at boot".
>
> Why there is no warning at boot if there's no overlap between
> memblock.resereve and memblock.memory?
> My question about memblock.reserve is really about the non overlapping
> ranges: why are ranges non overlapping with memblock.memory regions,
> added to memblock.reserve, and why aren't those calculated
> automatically as reverse of memblock.memory?
Once there was this comment in arch/x86/kernel/e820.c:
/*
* all !E820_TYPE_RAM ranges (including gap ranges) are put
* into memblock.reserved to make sure that struct pages in
* such regions are not left uninitialized after bootup.
*/
I presume there were struct pages that corresponded to some unusable
memory and they were not initilized, so the solution was to add them to
memblock.reserved.
> It's easy to see that when memblock.reserve overlaps fully, it makes
> perfect sense and it has to stay for it. I was really only thinking at
> the usage like above of memblock_reserve that looks like it should be
> turned into a noop and deleted.
TBH, the whole interaction between e820 and memblock keeps me puzzled
and I can only make educated guesses why some ranges here are
memblock_reserve()'d and some memblock_add()ed.
I think what should be there is that e820 entries that are essentially
RAM, used by BIOS or not, should be listed in memblock.memory. Then
using memblock_reserve() for parts that BIOS claimed for itself would
have the same semantics as for memory allocated by kernel.
I.e. if there is a DIMM from 0 to, say 512M, memblock.memory will have a
range [0, 512M]. And areas such as 0x000-0xfff, 0x9d000-0x9ffff will be
in memblock.reserved.
Than in page_alloc.c we'll know that we have a physical memory bank from
0 to 512M but there are some ranges that we cannot use.
I suggested it back then when the issue with compaction was reported at
the first time, but Baoquan mentioned that there are systems that cannot
even tolerate having BIOS reserved areas in the page tables and I didn't
continue to pursue this.
Now I'm thinking to resurrect this patch with some additions so that
init_mem_mapping could skip such regions.
[1] https://lore.kernel.org/lkml/20200528090731.GI20045@MiWiFi-R3L-srv/#t
> Thanks,
> Andrea
>
--
Sincerely yours,
Mike.
Powered by blists - more mailing lists