linux-kernel - Re: [PATCH printk v4 27/27] lockdep: Mark emergency sections in lockdep splats

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Zh5J_dxNi0rHG9-I@pathway.suse.cz>
Date: Tue, 16 Apr 2024 11:51:41 +0200
From: Petr Mladek <pmladek@...e.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
	Waiman Long <longman@...hat.com>, Boqun Feng <boqun.feng@...il.com>
Subject: Re: [PATCH printk v4 27/27] lockdep: Mark emergency sections in
 lockdep splats

On Wed 2024-04-03 00:17:29, John Ogness wrote:
> Mark emergency sections wherever multiple lines of
> lock debugging output are generated. In an emergency
> section the CPU will not perform console output for the
> printk() calls. Instead, a flushing of the console
> output is triggered when exiting the emergency section.
> This allows the full message block to be stored as
> quickly as possible in the ringbuffer.
> 
> Signed-off-by: John Ogness <john.ogness@...utronix.de>
> ---
>  kernel/locking/lockdep.c | 91 ++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 88 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 151bd3de5936..80cfbe7b340e 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -5150,6 +5211,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>  #endif
>  	if (unlikely(curr->lockdep_depth >= MAX_LOCK_DEPTH)) {
>  		debug_locks_off();
> +		nbcon_cpu_emergency_enter();
>  		print_lockdep_off("BUG: MAX_LOCK_DEPTH too low!");
>  		printk(KERN_DEBUG "depth: %i  max: %lu!\n",
>  		       curr->lockdep_depth, MAX_LOCK_DEPTH);
> @@ -5157,6 +5219,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>  		lockdep_print_held_locks(current);
>  		debug_show_all_locks();

My concern is that the dump of all locks would not fit into the
buffer. And it might even trigger softlockup. See below.


>  		dump_stack();
> +		nbcon_cpu_emergency_exit();
>  
>  		return 0;
>  	}
> @@ -6609,6 +6688,7 @@ void debug_show_all_locks(void)
>  		pr_warn("INFO: lockdep is turned off.\n");
>  		return;
>  	}
> +	nbcon_cpu_emergency_enter();
>  	pr_warn("\nShowing all locks held in the system:\n");
>  
>  	rcu_read_lock();

The code dumping the locks looks like:

	for_each_process_thread(g, p) {
		if (!p->lockdep_depth)
			continue;
		lockdep_print_held_locks(p);
		touch_nmi_watchdog();
		touch_all_softlockup_watchdogs();
	}

I see two problems here:

1. I belive that the watchdogs are touched for a reason. And they will
   be useless when we print everything in a single emergency context
   and emit all messages at the end.

2. The default config LOG_BUF_SHIFT is 17. So the default kernel
   buffer size is 128kB. The number of messages is bound by
   the number of processes. I am afraid that all messages might
   not fit into the buffer.


I see two solutions:

1. Take the emergency context only around single dump:

		nbcon_cpu_emergency_enter();
		lockdep_print_held_locks(p);
		nbcon_cpu_emergency_exit();


2. Explicitely flush the printk buffer here. Something like:

		lockdep_print_held_locks(p);
		nbcon_cpu_emergency_flush();
		touch_nmi_watchdog();
		touch_all_softlockup_watchdogs();


, where nbcon_cpu_emergency_flush() would do something like:

/**
 * nbcon_cpu_emergency_flush - Explicitly flush consoles in
 *	the middle of emergency context.
 *
 * Both nbcon and legacy consoles are flushed.
 *
 * It should be used only when there are too many messages printed
 * in emergency context, for example, printing backtraces of all
 * CPUs or processes. It is typically needed when the watchdogs
 * have to be touched as well.
 *
void nbcon_cpu_emergency_flush()
{
	/* The explicit flush is needed only in the emergency context. */
	if (!nbcon_get_cpu_emergency_nesting())
		return;

	nbcon_atomic_flush_pending();

	if (printing_via_unlock && !in_nmi()) {
		if (console_trylock())
			console_unlock();
	}
}


I like the 2nd solution. The rule is more or less clear. The
explicit flush is needed when the the code needed to touch watchdogs.

Maybe, we should go even further and call the flush directly
from the touch_*_*watchdog() API. It has effect only in emergency
context.

Best Regards,
Petr