[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bda4cf52-d81a-4935-b45a-09e9439e33b6@redhat.com>
Date: Tue, 18 Feb 2025 21:57:06 +0100
From: David Hildenbrand <david@...hat.com>
To: Gregory Price <gourry@...rry.net>
Cc: Yang Shi <shy828301@...il.com>, lsf-pc@...ts.linux-foundation.org,
linux-mm@...ck.org, linux-cxl@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: CXL Boot to Bash - Section 3: Memory (block) Hotplug
On 18.02.25 21:25, Gregory Price wrote:
> On Tue, Feb 18, 2025 at 08:25:59PM +0100, David Hildenbrand wrote:
>> On 18.02.25 19:04, Gregory Price wrote:
>>
>> Hm?
>>
>> If you enable memmap_on_memory, we will place the memmap on that carved-out
>> region, independent of ZONE_NORMAL/ZONE_MOVABLE etc. It's the "altmap".
>>
>> Reason that we can place the memmap on a ZONE_MOVABLE is because, although
>> it is "unmovable", we told memory offlining code that it doesn't have to
>> care about offlining that memmap carveout, there is no migration to be done.
>> Just offline the block (memmap gets stale) and remove that block (memmap
>> gets removed).
>>
>> If there is a reason where we carve out the memmap and *not* use it, that
>> case must be fixed.
>>
>
> Hm, I managed to trace down the wrong path on this particular code.
>
> I will go back and redo my tests to sanity check, but here's what I
> would expect to see:
>
> 1) if memmap_on_memory is off, and hotplug capacity (node1) is
> zone_movable - then zone_normal (node0) should have N pages
> accounted in nr_memmap_pages
Right, we'll allocate the memmap from the buddy, which ends up
allocating from ZONE_NORMAL on that node.
>
> 1a) when dropping these memory blocks, I should see node0 memory
> use drop by 4GB - since this is just GFP_KERNEL pages.
I assume you mean "when hotunplugging them". Yes, we should be freeing the memmap back to the buddy.
>
> 2) if memmap_on_memory is on, and hotplug capacity (node1) is
> zone_movable - then each memory block (256MB) should appear
> as 252MB (-4MB of 64-byte page structs). For 256GB (my system)
> I should see a total of 252GB of onlined memory (-4GB of page struct)
In memory_block_online(), we have:
/*
* Account once onlining succeeded. If the zone was unpopulated, it is
* now already properly populated.
*/
if (nr_vmemmap_pages)
adjust_present_page_count(pfn_to_page(start_pfn), mem->group,
nr_vmemmap_pages);
So we'd add the vmemmap pages to
* zone->present_pages
* zone->zone_pgdat->node_present_pages
(mhp_init_memmap_on_memory() moved the vmemmap pages to ZONE_MOVABLE)
However, we don't add them to
* zone->managed_pages
* totalram pages
/proc/zoneinfo would show them as present but not managed.
/proc/meminfo would not include them in MemTotal
We could adjust the latter two, if there is a problem.
(just needs some adjust_managed_page_count() calls)
So yes, staring at MemTotal, you should see an increase by 252 MiB right now.
>
> 2a) when dropping these memory blocks, I should see node0 memory use
> stay the same - since it was vmemmap usage.
Yes.
>
> I will double check that this isn't working as expected, and i'll double
> check for a build option as well.
>
> stupid question - it sorta seems like you'd want this as the default
> setting for driver-managed hotplug memory blocks, but I suppose for
> very small blocks there's problems (as described in the docs).
The issue is that it is per-memblock. So you'll never have 1 GiB ranges
of consecutive usable memory (e.g., 1 GiB hugetlb page).
>
> :thinking: - is it silly to suggest maybe a per-driver memmap_on_memory
> setting rather than just a global setting? For CXL capacity, this seems
> like a no-brainer since blocks can't be smaller than 256MB (per spec).
I thought we had that? See MHP_MEMMAP_ON_MEMORY set by dax/kmem.
IIRC, the global toggle must be enabled for the driver option to be considered.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists