linux-kernel - [PATCH 3/3] timers: Disable memory pre-allocation of timer debug objects

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250604220926.870760-4-longman@redhat.com>
Date: Wed,  4 Jun 2025 18:09:26 -0400
From: Waiman Long <longman@...hat.com>
To: Thomas Gleixner <tglx@...utronix.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Anna-Maria Behnsen <anna-maria@...utronix.de>,
	Frederic Weisbecker <frederic@...nel.org>
Cc: linux-kernel@...r.kernel.org,
	Waiman Long <longman@...hat.com>
Subject: [PATCH 3/3] timers: Disable memory pre-allocation of timer debug objects

A circular locking dependency lockdep splat was hit recently with a
debug kernel. The dependency chain (in reverse order) is:

  -> #3 (&zone->lock){-.-.}-{2:2}:
  -> #2 (&base->lock){-.-.}-{2:2}:
  -> #1 (&console_sch_key){-.-.}-{2:2}:
  -> #0 (console_owner){..-.}-{0:0}:

The last one is from calling printk() within the rmqueue_bulk() call in
mm/page_alloc.c. The "base->lock" is from lock_timer_base() and first
one is due to calling add_timer_on() leading to debug_object_activate()
doing actual memory allocation acquiring the zone lock.

The console_sch_key comes from a s390 console driver in driver/s390/cio.
The console_sch_key -> timer dependency happens because the console
driver is setting a timeout value while holding its lock. Apparently it
is pretty common for a console driver to use timer for timeout or other
timing purposes. So this may happen to other console drivers as well.

One way to break this circular locking dependency is to disallow any
memory allocation when a timer debug object is being handled. Do this by
setting the ODEBUG_FLAG_NO_ALLOC flag in the timer_debug_descr structure.

The figures below show the number of times the debug_objects_fill_pool()
function has reached the statement right before and after the no_alloc
check in initial bootup and after running a parallel kernel build on
a 2-socket 96-threads x86-64 system.

			 Before      After     non-timer %
		 	 ------      -----     -----------
  Initial bootup	  150,730     148,198     98.3%
  Parallel kernel build 5,974,464   5,893,116     98.6%

So from object pre-allocation perspective, timer debug objects represent
just a small slice of the total number of debug objects to be processed.

The allocation of debug_object to the global pool happens when its object
count falls below the (256 + 16 * num_possible_cpus) threshold. Even
then, there may still be free objects available in the percpu pool. So
the chance that debug_objects gets disabled because it is running out
of free debug_object should be minimal.

Signed-off-by: Waiman Long <longman@...hat.com>
---
 kernel/time/timer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 553fa469d7cc..e0be64591e43 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -775,6 +775,7 @@ static bool timer_fixup_assert_init(void *addr, enum debug_obj_state state)

 static const struct debug_obj_descr timer_debug_descr = {
 	.name			= "timer_list",
+	.flags			= ODEBUG_FLAG_NO_ALLOC,
 	.debug_hint		= timer_debug_hint,
 	.is_static_object	= timer_is_static_object,
 	.fixup_init		= timer_fixup_init,
-- 
2.49.0