lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20250728190248.605750-1-longman@redhat.com>
Date: Mon, 28 Jul 2025 15:02:48 -0400
From: Waiman Long <longman@...hat.com>
To: Catalin Marinas <catalin.marinas@....com>,
	Andrew Morton <akpm@...ux-foundation.org>
Cc: linux-mm@...ck.org,
	linux-kernel@...r.kernel.org,
	Waiman Long <longman@...hat.com>
Subject: [PATCH] mm/kmemleak: Avoid soft lockup in __kmemleak_do_cleanup()

A soft lockup warning was observed on a relative small system x86-64
system with 16 GB of memory when running a debug kernel with kmemleak
enabled.

  watchdog: BUG: soft lockup - CPU#8 stuck for 33s! [kworker/8:1:134]

The test system was running a workload with hot unplug happening
in parallel. Then kemleak decided to disable itself due to its
inability to allocate more kmemleak objects. The debug kernel has its
CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE set to 40,000.

The soft lockup happened in kmemleak_do_cleanup() when the existing
kmemleak objects were being removed and deleted one-by-one in a loop
via a workqueue. In this particular case, there are at least 40,000
objects that need to be processed and given the slowness of a debug
kernel and the fact that a raw_spinlock has to be acquired and released
in __delete_object(), it could take a while to properly handle all
these objects.

As kmemleak has been disabled in this case, the object removal and
deletion process can be further optimized as locking isn't really
needed. However, it is probably not worth the effort to optimize for
such an edge case that should rarely happen. So the simple solution is
to call cond_resched() at periodic interval in the iteration loop to
avoid soft lockup.

Signed-off-by: Waiman Long <longman@...hat.com>
---
 mm/kmemleak.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 8d588e685311..620abd95e680 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -2181,6 +2181,7 @@ static const struct file_operations kmemleak_fops = {
 static void __kmemleak_do_cleanup(void)
 {
 	struct kmemleak_object *object, *tmp;
+	unsigned int cnt = 0;
 
 	/*
 	 * Kmemleak has already been disabled, no need for RCU list traversal
@@ -2189,6 +2190,10 @@ static void __kmemleak_do_cleanup(void)
 	list_for_each_entry_safe(object, tmp, &object_list, object_list) {
 		__remove_object(object);
 		__delete_object(object);
+
+		/* Call cond_resched() once per 64 iterations to avoid soft lockup */
+		if (!(++cnt & 0x3f))
+			cond_resched();
 	}
 }
 
-- 
2.50.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ