[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c1559dcb-7953-fe08-604a-5eaf202bf662@redhat.com>
Date: Mon, 15 Feb 2021 09:45:30 +0100
From: David Hildenbrand <david@...hat.com>
To: Mike Rapoport <rppt@...nel.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Andrea Arcangeli <aarcange@...hat.com>,
Baoquan He <bhe@...hat.com>, Borislav Petkov <bp@...en8.de>,
Chris Wilson <chris@...is-wilson.co.uk>,
"H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Ćukasz Majczak <lma@...ihalf.com>,
Mel Gorman <mgorman@...e.de>, Michal Hocko <mhocko@...nel.org>,
Mike Rapoport <rppt@...ux.ibm.com>, Qian Cai <cai@....pw>,
"Sarvela, Tomi P" <tomi.p.sarvela@...el.com>,
Thomas Gleixner <tglx@...utronix.de>,
Vlastimil Babka <vbabka@...e.cz>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, stable@...r.kernel.org, x86@...nel.org
Subject: Re: [PATCH v5 1/1] mm: refactor initialization of struct page for
holes in memory layout
On 14.02.21 18:29, Mike Rapoport wrote:
> On Fri, Feb 12, 2021 at 10:56:19AM +0100, David Hildenbrand wrote:
>> On 12.02.21 10:55, David Hildenbrand wrote:
>>> On 08.02.21 12:08, Mike Rapoport wrote:
>>>> +#ifdef CONFIG_SPARSEMEM
>>>> + /*
>>>> + * Sections in the memory map may not match actual populated
>>>> + * memory, extend the node span to cover the entire section.
>>>> + */
>>>> + *start_pfn = round_down(*start_pfn, PAGES_PER_SECTION);
>>>> + *end_pfn = round_up(*end_pfn, PAGES_PER_SECTION);
>>>
>>> Does that mean that we might create overlapping zones when one node
>>
>> s/overlapping zones/overlapping nodes/
>>
>>> starts in the middle of a section and the other one ends in the middle
>>> of a section?
>>
>>> Could it be a problem? (e.g., would we have to look at neighboring nodes
>>> when making the decision to extend, and how far to extend?)
>
> Having a node end/start in a middle of a section would be a problem, but in
> this case I don't see a way to detect how a node should be extended :(
Running QEMU with something like:
...
-m 8G \
-smp sockets=2,cores=2 \
-object memory-backend-ram,id=bmem0,size=4160M \
-object memory-backend-ram,id=bmem1,size=4032M \
-numa node,nodeid=0,cpus=0-1,memdev=bmem0 -numa node,nodeid=1,cpus=2-3,memdev=bmem1 \
...
Creates such a setup.
With an older kernel:
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000023fffffff] usable
[...]
[ 0.002506] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
[ 0.002508] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff]
[ 0.002509] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x143ffffff]
[ 0.002510] ACPI: SRAT: Node 1 PXM 1 [mem 0x144000000-0x23fffffff]
[ 0.002511] NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0xbfffffff] -> [mem 0x00000000-0xbfffffff]
[ 0.002513] NUMA: Node 0 [mem 0x00000000-0xbfffffff] + [mem 0x100000000-0x143ffffff] -> [mem 0x00000000-0x143ffffff]
[ 0.002519] NODE_DATA(0) allocated [mem 0x143fd5000-0x143ffffff]
[ 0.002669] NODE_DATA(1) allocated [mem 0x23ffd2000-0x23fffcfff]
[ 0.017947] memblock: reserved range [0x0000000000000000-0x0000000000001000] is not in memory
[ 0.017953] memblock: reserved range [0x000000000009f000-0x0000000000100000] is not in memory
[ 0.017956] Zone ranges:
[ 0.017957] DMA [mem 0x0000000000000000-0x0000000000ffffff]
[ 0.017958] DMA32 [mem 0x0000000001000000-0x00000000ffffffff]
[ 0.017960] Normal [mem 0x0000000100000000-0x000000023fffffff]
[ 0.017961] Device empty
[ 0.017962] Movable zone start for each node
[ 0.017964] Early memory node ranges
[ 0.017965] node 0: [mem 0x0000000000000000-0x00000000bffdffff]
[ 0.017966] node 0: [mem 0x0000000100000000-0x0000000143ffffff]
[ 0.017967] node 1: [mem 0x0000000144000000-0x000000023fffffff]
[ 0.017969] Initmem setup node 0 [mem 0x0000000000000000-0x0000000143ffffff]
[ 0.017971] On node 0 totalpages: 1064928
[ 0.017972] DMA zone: 64 pages used for memmap
[ 0.017973] DMA zone: 21 pages reserved
[ 0.017974] DMA zone: 4096 pages, LIFO batch:0
[ 0.017994] DMA32 zone: 12224 pages used for memmap
[ 0.017995] DMA32 zone: 782304 pages, LIFO batch:63
[ 0.022281] DMA32: Zeroed struct page in unavailable ranges: 32
[ 0.022286] Normal zone: 4352 pages used for memmap
[ 0.022287] Normal zone: 278528 pages, LIFO batch:63
[ 0.023769] Initmem setup node 1 [mem 0x0000000144000000-0x000000023fffffff]
[ 0.023774] On node 1 totalpages: 1032192
[ 0.023775] Normal zone: 16128 pages used for memmap
[ 0.023775] Normal zone: 1032192 pages, LIFO batch:63
With current next/master:
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000023fffffff] usable
[...]
[ 0.002419] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
[ 0.002421] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff]
[ 0.002422] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x143ffffff]
[ 0.002423] ACPI: SRAT: Node 1 PXM 1 [mem 0x144000000-0x23fffffff]
[ 0.002424] NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0xbfffffff] -> [mem 0x00000000-0xbfffffff]
[ 0.002426] NUMA: Node 0 [mem 0x00000000-0xbfffffff] + [mem 0x100000000-0x143ffffff] -> [mem 0x00000000-0x143ffffff]
[ 0.002432] NODE_DATA(0) allocated [mem 0x143fd5000-0x143ffffff]
[ 0.002583] NODE_DATA(1) allocated [mem 0x23ffd2000-0x23fffcfff]
[ 0.017722] Zone ranges:
[ 0.017726] DMA [mem 0x0000000000000000-0x0000000000ffffff]
[ 0.017728] DMA32 [mem 0x0000000001000000-0x00000000ffffffff]
[ 0.017729] Normal [mem 0x0000000100000000-0x000000023fffffff]
[ 0.017731] Device empty
[ 0.017732] Movable zone start for each node
[ 0.017734] Early memory node ranges
[ 0.017735] node 0: [mem 0x0000000000001000-0x000000000009efff]
[ 0.017736] node 0: [mem 0x0000000000100000-0x00000000bffdffff]
[ 0.017737] node 0: [mem 0x0000000100000000-0x0000000143ffffff]
[ 0.017738] node 1: [mem 0x0000000144000000-0x000000023fffffff]
[ 0.017741] Initmem setup node 0 [mem 0x0000000000000000-0x0000000147ffffff]
[ 0.017742] On node 0 totalpages: 1064830
[ 0.017743] DMA zone: 64 pages used for memmap
[ 0.017744] DMA zone: 21 pages reserved
[ 0.017745] DMA zone: 3998 pages, LIFO batch:0
[ 0.017765] DMA zone: 98 pages in unavailable ranges
[ 0.017766] DMA32 zone: 12224 pages used for memmap
[ 0.017766] DMA32 zone: 782304 pages, LIFO batch:63
[ 0.022042] DMA32 zone: 32 pages in unavailable ranges
[ 0.022046] Normal zone: 4608 pages used for memmap
[ 0.022047] Normal zone: 278528 pages, LIFO batch:63
[ 0.023601] Normal zone: 16384 pages in unavailable ranges
[ 0.023606] Initmem setup node 1 [mem 0x0000000140000000-0x000000023fffffff]
[ 0.023608] On node 1 totalpages: 1032192
[ 0.023609] Normal zone: 16384 pages used for memmap
[ 0.023609] Normal zone: 1032192 pages, LIFO batch:63
[ 0.029267] Normal zone: 16384 pages in unavailable ranges
In this setup, one node ends in the middle of a section (+64MB), the
other one starts in the middle of the same section (+64MB).
After your patch, the nodes overlap (in one section)
I can spot that each node still has the same number of present pages and
that each node now has exactly 64MB unavailable pages (the extra ones spanned).
So at least here, it looks like the machinery is still doing the right thing?
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists