linux-kernel - Re: [PATCH v1 2/2] drivers/base/memory: determine and store zone for single-zone memory blocks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5788c8f5-8c2a-45c1-c374-1bf87c189c86@redhat.com>
Date:   Mon, 31 Jan 2022 12:40:24 +0100
From:   David Hildenbrand <david@...hat.com>
To:     Oscar Salvador <osalvador@...e.de>
Cc:     linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Michal Hocko <mhocko@...e.com>,
        Rafael Parra <rparrazo@...hat.com>
Subject: Re: [PATCH v1 2/2] drivers/base/memory: determine and store zone for
 single-zone memory blocks

On 31.01.22 12:29, Oscar Salvador wrote:
> On Fri, Jan 28, 2022 at 04:26:20PM +0100, David Hildenbrand wrote:
>> For memory hot(un)plug, we only really care about memory blocks that:
>> * span a single zone (and, thereby, a single node)
>> * are completely System RAM (IOW, no holes, no ZONE_DEVICE)
>> If one of these conditions is not met, we reject memory offlining.
>> Hotplugged memory blocks (starting out offline), always meet both
>> conditions.

Thanks for the review Oscar!

> 
> This has been always hard for me to follow, so bear with me.
> 
> I remember we changed the memory-hotplug policy, not long ago, wrt.
> what we can online/offline so we could get rid of certain assumptions like
> "there are no holes in this memblock, so it can go" etc.

Yes, end of 2019 via c5e79ef561b0 ("mm/memory_hotplug.c: don't allow to
online/offline memory blocks with holes").

> 
> AFAIR, we can only offline if the memory
> 
> 1) belongs to a single node ("which is always the case for
>    hotplugged-memory, boot memory is trickier")
> 2) does not have any holes
> 3) spans a single zone
> 
> These are the only requeriments we have atm, right?

The most prominent core requirements, yes, leaving memory notifiers out
of the picture.

3) implies 1) as zones are per-node.

> 
> By default, hotplugged memory already complies with they all three,
> only in case of ZONE_DEVICE stuff we might violate 2) and 3).
> 
>> There are three scenarios to handle:
> ...
> ...
> 
>> @@ -225,6 +226,9 @@ static int memory_block_offline(struct memory_block *mem)
>>  	unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
>>  	int ret;
>>  
>> +	if (!mem->zone)
>> +		return -EBUSY;
> 
> Should not we return -EINVAL? I mean, -EBUSY reads like this might be a
> temporary error which might get fixed later on, but that isn't the case.
> 
>> @@ -234,7 +238,7 @@ static int memory_block_offline(struct memory_block *mem)
>>  					  -nr_vmemmap_pages);
>>  
>>  	ret = offline_pages(start_pfn + nr_vmemmap_pages,
>> -			    nr_pages - nr_vmemmap_pages, mem->group);
>> +			    nr_pages - nr_vmemmap_pages, mem->zone, mem->group);
> 
> Why not passing the node as well?

The zone implies the node, and the prototype now matches the one of
online_pages(). So if we'd ever want to change that we should do it for
both functions, but I don't necessarily see the need for it.


> 
>> +static struct zone *early_node_zone_for_memory_block(struct memory_block *mem,
>> +						     int nid)
>> +{
>> +	const unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
>> +	const unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
>> +	struct zone *zone, *matching_zone = NULL;
>> +	pg_data_t *pgdat = NODE_DATA(nid);
> 
> I was about to complain because in init_memory_block() you call
> early_node_zone_for_memory_block() with nid == NUMA_NODE_NODE, but then
> I saw that NODE_DATA on !CONFIG_NUMA falls to contig_page_data.
> So, I guess we cannot really reach this on CONFIG_NUMA machines with nid
> being NUMA_NO_NODE, right? (do we want to add a check just in case?)
> 

Yes, on CONFIG_NUMA this is only called via memory_block_set_nid().
memory_block_set_nid() is only available with CONFIG_NUMA and calling
memory_block_set_nid() with NUMA_NO_NODE would be a BUG.

(before sending this out I even had a BUG_ON() in memory_block_set_nid()
to verify that, but I removed it because BUG_ON's are frowned-upon.)


-- 
Thanks,

David / dhildenb