linux-kernel - Re: [PATCH 6/9] lockdep: Introduce wait-type checks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b8cb2c4084f2cfe0eff68415b502b4e352704401.camel@siemens.com>
Date: Mon, 17 Nov 2025 13:42:40 +0000
From: "Sverdlin, Alexander" <alexander.sverdlin@...mens.com>
To: "mingo@...nel.org" <mingo@...nel.org>, "peterz@...radead.org"
	<peterz@...radead.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "bigeasy@...utronix.de"
	<bigeasy@...utronix.de>
CC: "torvalds@...ux-foundation.org" <torvalds@...ux-foundation.org>,
	"tglx@...utronix.de" <tglx@...utronix.de>, "paulmck@...nel.org"
	<paulmck@...nel.org>, "rostedt@...dmis.org" <rostedt@...dmis.org>,
	"will@...nel.org" <will@...nel.org>, "joel@...lfernandes.org"
	<joel@...lfernandes.org>
Subject: Re: [PATCH 6/9] lockdep: Introduce wait-type checks

Hello Peter and all,

I'm trying to understand the splat I've got with v6.18-rc1:

=============================
[ BUG: Invalid wait context ]
6.18.0-rc1+git476309497a53 #1 Tainted: G           O       
-----------------------------
some-user-space-process/1251 is trying to lock:
ffff000005cbc548 (&counter->events_list_lock){....}-{3:3}, at: counter_push_event+0x68/0x430 [counter]
other info that might help us debug this:
context-{2:2}
no locks held by some-user-space-process/1251.
stack backtrace:
CPU: 0 UID: 0 PID: 1251 Comm: some-user-space-process Tainted: G           O        6.18.0-rc1+git476309497a53 #1 PREEMPT 
Call trace:
 show_stack+0x20/0x38 (C)
 dump_stack_lvl+0x8c/0xd0
 dump_stack+0x18/0x28
 __lock_acquire+0x91c/0x1f78
 lock_acquire+0x1c4/0x338
 _raw_spin_lock_irqsave+0x68/0x98
 counter_push_event+0x68/0x430 [counter]
 interrupt_cnt_isr+0x40/0x78 [interrupt_cnt]
 __handle_irq_event_percpu+0xa4/0x398
 handle_irq_event+0x54/0xb8
 handle_simple_irq+0xe4/0x128
 handle_irq_desc+0x48/0x68
 generic_handle_domain_irq+0x24/0x38
 gpio_irq_handler+0xa0/0x140
 handle_irq_desc+0x48/0x68
 generic_handle_domain_irq+0x24/0x38
 gic_handle_irq+0x54/0x128
 call_on_irq_stack+0x30/0x70
 do_interrupt_handler+0x88/0x98
 el0_interrupt+0x5c/0x270
 __el0_irq_handler_common+0x18/0x28
 el0t_64_irq_handler+0x10/0x20
 el0t_64_irq+0x198/0x1a0

This is not PREEMPT_RT, but still CONFIG_PROVE_RAW_LOCK_NESTING=y config...
The relevant logic in the current v6.18-rcX is still largely unchanged
from the Subject commit and...

On Fri, 2020-03-13 at 18:46 +0100, Sebastian Andrzej Siewior wrote:
> From: Peter Zijlstra <peterz@...radead.org>
> 
> Extend lockdep to validate lock wait-type context.
> 
> The current wait-types are:
> 
> 	LD_WAIT_FREE,		/* wait free, rcu etc.. */
> 	LD_WAIT_SPIN,		/* spin loops, raw_spinlock_t etc.. */
> 	LD_WAIT_CONFIG,		/* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */
> 	LD_WAIT_SLEEP,		/* sleeping locks, mutex_t etc.. */
> 
> Where lockdep validates that the current lock (the one being acquired)
> fits in the current wait-context (as generated by the held stack).
> 
> This ensures that there is no attempt to acquire mutexes while holding
> spinlocks, to acquire spinlocks while holding raw_spinlocks and so on. In
> other words, its a more fancy might_sleep().
> 
> Obviously RCU made the entire ordeal more complex than a simple single
> value test because RCU can be acquired in (pretty much) any context and
> while it presents a context to nested locks it is not the same as it
> got acquired in.
> 
> Therefore its necessary to split the wait_type into two values, one
> representing the acquire (outer) and one representing the nested context
> (inner). For most 'normal' locks these two are the same.
> 
> [ To make static initialization easier we have the rule that:
>   .outer == INV means .outer == .inner; because INV == 0. ]
> 
> It further means that its required to find the minimal .inner of the held
> stack to compare against the outer of the new lock; because while 'normal'
> RCU presents a CONFIG type to nested locks, if it is taken while already
> holding a SPIN type it obviously doesn't relax the rules.
> 
> Below is an example output generated by the trivial test code:
> 
>   raw_spin_lock(&foo);
>   spin_lock(&bar);
>   spin_unlock(&bar);
>   raw_spin_unlock(&foo);
> 
>  [ BUG: Invalid wait context ]
>  -----------------------------
>  swapper/0/1 is trying to lock:
>  ffffc90000013f20 (&bar){....}-{3:3}, at: kernel_init+0xdb/0x187
>  other info that might help us debug this:
>  1 lock held by swapper/0/1:
>   #0: ffffc90000013ee0 (&foo){+.+.}-{2:2}, at: kernel_init+0xd1/0x187
> 
> The way to read it is to look at the new -{n,m} part in the lock
> description; -{3:3} for the attempted lock, and try and match that up to
> the held locks, which in this case is the one: -{2,2}.
> 
> This tells that the acquiring lock requires a more relaxed environment than
> presented by the lock stack.
> 
> Currently only the normal locks and RCU are converted, the rest of the
> lockdep users defaults to .inner = INV which is ignored. More conversions
> can be done when desired.
> 
> The check for spinlock_t nesting is not enabled by default. It's a separate
> config option for now as there are known problems which are currently
> addressed. The config option allows to identify these problems and to
> verify that the solutions found are indeed solving them.
> 
> The config switch will be removed and the checks will permanently enabled
> once the vast majority of issues has been addressed.
> 
> [ bigeasy: Move LD_WAIT_FREE,… out of CONFIG_LOCKDEP to avoid compile
> 	   failure with CONFIG_DEBUG_SPINLOCK + !CONFIG_LOCKDEP]
> [ tglx: Add the config option ]
> 
> Requested-by: Thomas Gleixner <tglx@...utronix.de>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>

[]

> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 32406ef0d6a2d..4a9abf8bd9d15 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -3798,6 +3806,113 @@ print_lock_nested_lock_not_held(struct task_struct *curr,
>  	dump_stack();
>  }
>  
> +static int
> +print_lock_invalid_wait_context(struct task_struct *curr,
> +				struct held_lock *hlock)
> +{
> +	if (!debug_locks_off())
> +		return 0;
> +	if (debug_locks_silent)
> +		return 0;
> +
> +	pr_warn("\n");
> +	pr_warn("=============================\n");
> +	pr_warn("[ BUG: Invalid wait context ]\n");
> +	print_kernel_ident();
> +	pr_warn("-----------------------------\n");
> +
> +	pr_warn("%s/%d is trying to lock:\n", curr->comm, task_pid_nr(curr));
> +	print_lock(hlock);
> +
> +	pr_warn("other info that might help us debug this:\n");
> +	lockdep_print_held_locks(curr);
> +
> +	pr_warn("stack backtrace:\n");
> +	dump_stack();
> +
> +	return 0;
> +}
> +
> +/*
> + * Verify the wait_type context.
> + *
> + * This check validates we takes locks in the right wait-type order; that is it
> + * ensures that we do not take mutexes inside spinlocks and do not attempt to
> + * acquire spinlocks inside raw_spinlocks and the sort.
> + *
> + * The entire thing is slightly more complex because of RCU, RCU is a lock that
> + * can be taken from (pretty much) any context but also has constraints.
> + * However when taken in a stricter environment the RCU lock does not loosen
> + * the constraints.
> + *
> + * Therefore we must look for the strictest environment in the lock stack and
> + * compare that to the lock we're trying to acquire.
> + */
> +static int check_wait_context(struct task_struct *curr, struct held_lock *next)
> +{
> +	short next_inner = hlock_class(next)->wait_type_inner;
> +	short next_outer = hlock_class(next)->wait_type_outer;
> +	short curr_inner;
> +	int depth;
> +
> +	if (!curr->lockdep_depth || !next_inner || next->trylock)
> +		return 0;
> +
> +	if (!next_outer)
> +		next_outer = next_inner;
> +
> +	/*
> +	 * Find start of current irq_context..
> +	 */
> +	for (depth = curr->lockdep_depth - 1; depth >= 0; depth--) {
> +		struct held_lock *prev = curr->held_locks + depth;
> +		if (prev->irq_context != next->irq_context)
> +			break;
> +	}
> +	depth++;
> +
> +	/*
> +	 * Set appropriate wait type for the context; for IRQs we have to take
> +	 * into account force_irqthread as that is implied by PREEMPT_RT.
> +	 */
> +	if (curr->hardirq_context) {
> +		/*
> +		 * Check if force_irqthreads will run us threaded.
> +		 */
> +		if (curr->hardirq_threaded)
> +			curr_inner = LD_WAIT_CONFIG;
> +		else
> +			curr_inner = LD_WAIT_SPIN;

... even though it's "if (curr->hardirq_threaded || curr->irq_config)" in the
meanwhile, it is unclear to me, why is it conditional at all and not always
"LD_WAIT_CONFIG" in hardirq context?

With !PREEMPT_RT but CONFIG_PROVE_RAW_LOCK_NESTING=y we always end up
with LD_WAIT_SPIN here and it always warns if we take "spinlock_t" in hardirq
context (as in the splat above). But the latter should be legal under all
circumstances, isn't it?

> +	} else if (curr->softirq_context) {
> +		/*
> +		 * Softirqs are always threaded.
> +		 */
> +		curr_inner = LD_WAIT_CONFIG;
> +	} else {
> +		curr_inner = LD_WAIT_MAX;
> +	}
> +
> +	for (; depth < curr->lockdep_depth; depth++) {
> +		struct held_lock *prev = curr->held_locks + depth;
> +		short prev_inner = hlock_class(prev)->wait_type_inner;
> +
> +		if (prev_inner) {
> +			/*
> +			 * We can have a bigger inner than a previous one
> +			 * when outer is smaller than inner, as with RCU.
> +			 *
> +			 * Also due to trylocks.
> +			 */
> +			curr_inner = min(curr_inner, prev_inner);
> +		}
> +	}
> +
> +	if (next_outer > curr_inner)
> +		return print_lock_invalid_wait_context(curr, next);
> +
> +	return 0;
> +}
> +
>  static int __lock_is_held(const struct lockdep_map *lock, int read);
>  
>  /*

-- 
Alexander Sverdlin
Siemens AG
www.siemens.com