lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250310115305.13599-1-donettom@linux.ibm.com>
Date: Mon, 10 Mar 2025 06:53:05 -0500
From: Donet Tom <donettom@...ux.ibm.com>
To: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        linux-kernel@...r.kernel.org, David Hildenbrand <david@...hat.com>
Cc: Ritesh Harjani <ritesh.list@...il.com>,
        "Rafael J . Wysocki" <rafael@...nel.org>,
        Danilo Krummrich <dakr@...nel.org>, Donet Tom <donettom@...ux.ibm.com>
Subject: [PATCH] driver/base/node.c: Fix softlockups during the initialization of large systems with interleaved memory blocks

On large systems with more than 64TB of DRAM, if the memory blocks
are interleaved, node initialization (node_dev_init()) could take
a long time since it iterates over each memory block. If the memory
block belongs to the current iterating node, the first pfn_to_nid
will provide the correct value. Otherwise, it will iterate over all
PFNs and check the nid. On non-preemptive kernels, this can result
in a watchdog softlockup warning. Even though CONFIG_PREEMPT_LAZY
is enabled in kernels now [1], we may still need to fix older
stable kernels to avoid encountering these kernel warnings during
boot.

This patch adds a cond_resched() call in node_dev_init() to avoid
this warning.

node_dev_init()
    register_one_node
        register_memory_blocks_under_node
            walk_memory_blocks()
                register_mem_block_under_node_early
                    get_nid_for_pfn
                        early_pfn_to_nid

In my system node4 has a memory block ranging from memory30351
to memory38524, and memory128433. The memory blocks between
memory38524 and memory128433 do not belong to this node.

In  walk_memory_blocks() we iterate over all memblocks starting
from memory38524 to memory128433.
In register_mem_block_under_node_early(), up to memory38524, the
first pfn correctly returns the corresponding nid and the function
returns from there. But after memory38524 and until memory128433,
the loop iterates through each pfn and checks the nid. Since the nid
does not match the required nid, the loop continues. This causes
the soft lockups.

[1]: https://lore.kernel.org/linuxppc-dev/20241116192306.88217-1-sshegde@linux.ibm.com/
Fixes: 2848a28b0a60 ("drivers/base/node: consolidate node device subsystem initialization in node_dev_init()")
Signed-off-by: Donet Tom <donettom@...ux.ibm.com>
---
 drivers/base/node.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 0ea653fa3433..107eb508e28e 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -975,5 +975,6 @@ void __init node_dev_init(void)
 		ret = register_one_node(i);
 		if (ret)
 			panic("%s() failed to add node: %d\n", __func__, ret);
+		cond_resched();
 	}
 }
-- 
2.43.5


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ