lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z_dWTU8UsvCHFMpN@localhost.localdomain>
Date: Thu, 10 Apr 2025 07:25:33 +0200
From: Oscar Salvador <osalvador@...e.de>
To: Aditya Gupta <adityag@...ux.ibm.com>
Cc: linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
	Danilo Krummrich <dakr@...nel.org>,
	David Hildenbrand <david@...hat.com>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Mahesh J Salgaonkar <mahesh@...ux.ibm.com>,
	"Rafael J. Wysocki" <rafael@...nel.org>,
	Sourabh Jain <sourabhjain@...ux.ibm.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [REPORT] Softlockups on PowerNV with upstream

On Wed, Apr 09, 2025 at 11:33:44PM +0530, Aditya Gupta wrote:
> Hi,
> 
> While booting current upstream kernel, I consistently get "softlockups", on IBM PowerNV system.
> 
> I have tested it only on PowerNV systems. But some architectures/platforms also
> might have it. PSeries systems don't have this issue though.
> 
> Bisect points to the following commit:
> 
>     commit 61659efdb35ce6c6ac7639342098f3c4548b794b
>     Author: Gavin Shan <gshan@...hat.com>
>     Date:   Wed Mar 12 09:30:43 2025 +1000
> 
>         drivers/base/memory: improve add_boot_memory_block()
> 
... 
> Console log
> -----------
> 
>     [    2.783371] smp: Brought up 4 nodes, 256 CPUs
>     [    2.783475] numa: Node 0 CPUs: 0-63
>     [    2.783537] numa: Node 2 CPUs: 64-127
>     [    2.783591] numa: Node 4 CPUs: 128-191
>     [    2.783653] numa: Node 6 CPUs: 192-255
>     [    2.804945] Memory: 735777792K/738197504K available (17536K kernel code, 5760K rwdata, 15232K rodata, 6528K init, 2517K bss, 1369664K reserved, 0K cma-reserved)

If I am not mistaken this is ~700GB, and PowerNV uses 16MB as section size,
and sections_per_block == 1 (I think).

The code before the mentioned commit, was something like:

 for (nr = base_section_nr; nr < base_section_nr + sections_per_block; nr++)
       if (present_section_nr(nr))
          section_count++;

 if (section_count == 0)
    return 0;
 return add_memory_block()

So, in case of PowerNV , we will just check one section at a time and
either return or call add_memory_block depending whether it is present.

Now, with the current code that is something different.
We now have 

memory_dev_init:
 for(nr = 0, nr <= __highest_present_section_nr; nr += 1)
     ret = add_boot_memory_block

add_boot_memory_block:
 for_each_present_section_nr(base_section_nr, nr) {
     if (nr >= (base_section_nr + sections_per_block))
            break;

     return add_memory_block();
 }
 return 0;

The thing is that next_present_section_nr() (which is called in
for_each_present_section_nr()) will loop until we find a present
section.
And then we will check whether the found section is beyond
base_section_nr + sections_per_block (where sections_per_block = 1).
If so, we skip add_memory_block.

Now, I think that the issue comes from for_each_present_section_nr
having to loop a lot until we find a present section.
And then the loop in memory_dev_init increments only by 1, which means
that the next iteration we might have to loop a lot again to find the
another present section. And so on and so forth.

Maybe we can fix this by making memory_dev_init() remember in which
section add_boot_memory_block returns.
Something like the following (only compile-tested)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 8f3a41d9bfaa..d97635cbfd1d 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -816,18 +816,25 @@ static int add_memory_block(unsigned long block_id, unsigned long state,
 	return 0;
 }

-static int __init add_boot_memory_block(unsigned long base_section_nr)
+static int __init add_boot_memory_block(unsigned long *base_section_nr)
 {
+	int ret;
 	unsigned long nr;

-	for_each_present_section_nr(base_section_nr, nr) {
-		if (nr >= (base_section_nr + sections_per_block))
+	for_each_present_section_nr(*base_section_nr, nr) {
+		if (nr >= (*base_section_nr + sections_per_block))
 			break;

-		return add_memory_block(memory_block_id(base_section_nr),
-					MEM_ONLINE, NULL, NULL);
+		ret = add_memory_block(memory_block_id(*base_section_nr),
+				       MEM_ONLINE, NULL, NULL);
+		*base_section = nr;
+		return ret;
 	}

+	if (nr == -1)
+		*base_section = __highest_present_section_nr + 1;
+	else
+		*base_section = nr;
 	return 0;
 }

@@ -973,9 +980,9 @@ void __init memory_dev_init(void)
 	 * Create entries for memory sections that were found
 	 * during boot and have been initialized
 	 */
-	for (nr = 0; nr <= __highest_present_section_nr;
-	     nr += sections_per_block) {
-		ret = add_boot_memory_block(nr);
+	nr = first_present_section_nr();
+	for (; nr <= __highest_present_section_nr; nr += sections_per_block) {
+		ret = add_boot_memory_block(&nr);
 		if (ret)
 			panic("%s() failed to add memory block: %d\n", __func__,
 			      ret);
 

@Aditya: can you please give it a try?



-- 
Oscar Salvador
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ