[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d982df07-e7d1-4d4f-a2c3-857901ccc0d0@linux.ibm.com>
Date: Fri, 11 Apr 2025 00:27:28 +0530
From: Donet Tom <donettom@...ux.ibm.com>
To: Mike Rapoport <rppt@...nel.org>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
linux-kernel@...r.kernel.org, David Hildenbrand <david@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
Ritesh Harjani <ritesh.list@...il.com>, rafael@...nel.org,
Danilo Krummrich <dakr@...nel.org>
Subject: Re: [PATCH 2/2] base/node: Use
curr_node_memblock_intersect_memory_block to Get Memory Block NID if
CONFIG_DEFERRED_STRUCT_PAGE_INIT is Set
On 4/10/25 1:37 PM, Mike Rapoport wrote:
> On Wed, Apr 09, 2025 at 10:57:57AM +0530, Donet Tom wrote:
>> In the current implementation, when CONFIG_DEFERRED_STRUCT_PAGE_INIT is
>> set, we iterate over all PFNs in the memory block and use
>> early_pfn_to_nid to find the NID until a match is found.
>>
>> This patch we are using curr_node_memblock_intersect_memory_block() to
>> check if the current node's memblock intersects with the memory block
>> passed when CONFIG_DEFERRED_STRUCT_PAGE_INIT is set. If an intersection
>> is found, the memory block is added to the current node.
>>
>> If CONFIG_DEFERRED_STRUCT_PAGE_INIT is not set, the existing mechanism
>> for finding the NID will continue to be used.
> I don't think we really need different mechanisms for different settings of
> CONFIG_DEFERRED_STRUCT_PAGE_INIT.
>
> node_dev_init() runs after all struct pages are already initialized and can
> always use pfn_to_nid().
In the current implementation, if CONFIG_DEFERRED_STRUCT_PAGE_INIT
is enabled, we perform a binary search in the memblock region to
determine the pfn's nid. Otherwise, we use pfn_to_nid() to obtain
the pfn's nid.
Your point is that we could unify this logic and always use
pfn_to_nid() to determine the pfn's nid, regardless of whether
CONFIG_DEFERRED_STRUCT_PAGE_INIT is set. Is that
correct?
>
> kernel_init_freeable() ->
> page_alloc_init_late(); /* completes initialization of deferred pages */
> ...
> do_basic_setup() ->
> driver_init() ->
> node_dev_init();
>
> The next step could be refactoring register_mem_block_under_node_early() to
> loop over memblock regions rather than over pfns.
So it the current implementation
node_dev_init()
register_one_node
register_memory_blocks_under_node
walk_memory_blocks()
register_mem_block_under_node_early
get_nid_for_pfn
We get each node's start and end PFN from the pg_data. Using these
values, we determine the memory block's start and end within the
current node. To identify the node to which these memory block
belongs,we iterate over each PFN in the range.
The problem I am facing is,
In my system node4 has a memory block ranging from memory30351
to memory38524, and memory128433. The memory blocks between
memory38524 and memory128433 do not belong to this node.
In walk_memory_blocks() we iterate over all memory blocks starting
from memory38524 to memory128433.
In register_mem_block_under_node_early(), up to memory38524, the
first pfn correctly returns the corresponding nid and the function
returns from there. But after memory38524 and until memory128433,
the loop iterates through each pfn and checks the nid. Since the nid
does not match the required nid, the loop continues. This causes
the soft lockups.
This issue occurs only when CONFIG_DEFERRED_STRUCT_PAGE_INIT
is enabled, as a binary search is used to determine the PFN's nid. When
this configuration is disabled, pfn_to_nid is faster, and the issue does
not seen.( Faster because nid is getting from page)
To speed up the code when CONFIG_DEFERRED_STRUCT_PAGE_INIT
is enabled, I added this function that iterates over all memblock regions
for each memory block to determine its nid.
"Loop over memblock regions instead of iterating over PFNs" -
My question is - in register_one_node, do you mean that we should iterate
over all memblock regions, identify the regions belonging to the current
node, and then retrieve the corresponding memory blocks to register them
under that node?
Thanks
Donet
>
>> Signed-off-by: Donet Tom <donettom@...ux.ibm.com>
>> ---
>> drivers/base/node.c | 37 +++++++++++++++++++++++++++++--------
>> 1 file changed, 29 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/base/node.c b/drivers/base/node.c
>> index cd13ef287011..5c5dd02b8bdd 100644
>> --- a/drivers/base/node.c
>> +++ b/drivers/base/node.c
>> @@ -20,6 +20,8 @@
>> #include <linux/pm_runtime.h>
>> #include <linux/swap.h>
>> #include <linux/slab.h>
>> +#include <linux/memblock.h>
>> +
>>
>> static const struct bus_type node_subsys = {
>> .name = "node",
>> @@ -782,16 +784,19 @@ static void do_register_memory_block_under_node(int nid,
>> ret);
>> }
>>
>> -/* register memory section under specified node if it spans that node */
>> -static int register_mem_block_under_node_early(struct memory_block *mem_blk,
>> - void *arg)
>> +static int register_mem_block_early_if_dfer_page_init(struct memory_block *mem_blk,
>> + unsigned long start_pfn, unsigned long end_pfn, int nid)
>> {
>> - unsigned long memory_block_pfns = memory_block_size_bytes() / PAGE_SIZE;
>> - unsigned long start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
>> - unsigned long end_pfn = start_pfn + memory_block_pfns - 1;
>> - int nid = *(int *)arg;
>> - unsigned long pfn;
>>
>> + if (curr_node_memblock_intersect_memory_block(start_pfn, end_pfn, nid))
>> + do_register_memory_block_under_node(nid, mem_blk, MEMINIT_EARLY);
>> + return 0;
>> +}
>> +
>> +static int register_mem_block_early__normal(struct memory_block *mem_blk,
>> + unsigned long start_pfn, unsigned long end_pfn, int nid)
>> +{
>> + unsigned long pfn;
>> for (pfn = start_pfn; pfn <= end_pfn; pfn++) {
>> int page_nid;
>>
>> @@ -821,6 +826,22 @@ static int register_mem_block_under_node_early(struct memory_block *mem_blk,
>> /* mem section does not span the specified node */
>> return 0;
>> }
>> +/* register memory section under specified node if it spans that node */
>> +static int register_mem_block_under_node_early(struct memory_block *mem_blk,
>> + void *arg)
>> +{
>> + unsigned long memory_block_pfns = memory_block_size_bytes() / PAGE_SIZE;
>> + unsigned long start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
>> + unsigned long end_pfn = start_pfn + memory_block_pfns - 1;
>> + int nid = *(int *)arg;
>> +
>> +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
>> + if (system_state < SYSTEM_RUNNING)
>> + return register_mem_block_early_if_dfer_page_init(mem_blk, start_pfn, end_pfn, nid);
>> +#endif
>> + return register_mem_block_early__normal(mem_blk, start_pfn, end_pfn, nid);
>> +
>> +}
>>
>> /*
>> * During hotplug we know that all pages in the memory block belong to the same
>> --
>> 2.48.1
>>
Powered by blists - more mailing lists