[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5788c8f5-8c2a-45c1-c374-1bf87c189c86@redhat.com>
Date: Mon, 31 Jan 2022 12:40:24 +0100
From: David Hildenbrand <david@...hat.com>
To: Oscar Salvador <osalvador@...e.de>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
Andrew Morton <akpm@...ux-foundation.org>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Michal Hocko <mhocko@...e.com>,
Rafael Parra <rparrazo@...hat.com>
Subject: Re: [PATCH v1 2/2] drivers/base/memory: determine and store zone for
single-zone memory blocks
On 31.01.22 12:29, Oscar Salvador wrote:
> On Fri, Jan 28, 2022 at 04:26:20PM +0100, David Hildenbrand wrote:
>> For memory hot(un)plug, we only really care about memory blocks that:
>> * span a single zone (and, thereby, a single node)
>> * are completely System RAM (IOW, no holes, no ZONE_DEVICE)
>> If one of these conditions is not met, we reject memory offlining.
>> Hotplugged memory blocks (starting out offline), always meet both
>> conditions.
Thanks for the review Oscar!
>
> This has been always hard for me to follow, so bear with me.
>
> I remember we changed the memory-hotplug policy, not long ago, wrt.
> what we can online/offline so we could get rid of certain assumptions like
> "there are no holes in this memblock, so it can go" etc.
Yes, end of 2019 via c5e79ef561b0 ("mm/memory_hotplug.c: don't allow to
online/offline memory blocks with holes").
>
> AFAIR, we can only offline if the memory
>
> 1) belongs to a single node ("which is always the case for
> hotplugged-memory, boot memory is trickier")
> 2) does not have any holes
> 3) spans a single zone
>
> These are the only requeriments we have atm, right?
The most prominent core requirements, yes, leaving memory notifiers out
of the picture.
3) implies 1) as zones are per-node.
>
> By default, hotplugged memory already complies with they all three,
> only in case of ZONE_DEVICE stuff we might violate 2) and 3).
>
>> There are three scenarios to handle:
> ...
> ...
>
>> @@ -225,6 +226,9 @@ static int memory_block_offline(struct memory_block *mem)
>> unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
>> int ret;
>>
>> + if (!mem->zone)
>> + return -EBUSY;
>
> Should not we return -EINVAL? I mean, -EBUSY reads like this might be a
> temporary error which might get fixed later on, but that isn't the case.
>
>> @@ -234,7 +238,7 @@ static int memory_block_offline(struct memory_block *mem)
>> -nr_vmemmap_pages);
>>
>> ret = offline_pages(start_pfn + nr_vmemmap_pages,
>> - nr_pages - nr_vmemmap_pages, mem->group);
>> + nr_pages - nr_vmemmap_pages, mem->zone, mem->group);
>
> Why not passing the node as well?
The zone implies the node, and the prototype now matches the one of
online_pages(). So if we'd ever want to change that we should do it for
both functions, but I don't necessarily see the need for it.
>
>> +static struct zone *early_node_zone_for_memory_block(struct memory_block *mem,
>> + int nid)
>> +{
>> + const unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
>> + const unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
>> + struct zone *zone, *matching_zone = NULL;
>> + pg_data_t *pgdat = NODE_DATA(nid);
>
> I was about to complain because in init_memory_block() you call
> early_node_zone_for_memory_block() with nid == NUMA_NODE_NODE, but then
> I saw that NODE_DATA on !CONFIG_NUMA falls to contig_page_data.
> So, I guess we cannot really reach this on CONFIG_NUMA machines with nid
> being NUMA_NO_NODE, right? (do we want to add a check just in case?)
>
Yes, on CONFIG_NUMA this is only called via memory_block_set_nid().
memory_block_set_nid() is only available with CONFIG_NUMA and calling
memory_block_set_nid() with NUMA_NO_NODE would be a BUG.
(before sending this out I even had a BUG_ON() in memory_block_set_nid()
to verify that, but I removed it because BUG_ON's are frowned-upon.)
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists