[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b8cb2c4084f2cfe0eff68415b502b4e352704401.camel@siemens.com>
Date: Mon, 17 Nov 2025 13:42:40 +0000
From: "Sverdlin, Alexander" <alexander.sverdlin@...mens.com>
To: "mingo@...nel.org" <mingo@...nel.org>, "peterz@...radead.org"
<peterz@...radead.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "bigeasy@...utronix.de"
<bigeasy@...utronix.de>
CC: "torvalds@...ux-foundation.org" <torvalds@...ux-foundation.org>,
"tglx@...utronix.de" <tglx@...utronix.de>, "paulmck@...nel.org"
<paulmck@...nel.org>, "rostedt@...dmis.org" <rostedt@...dmis.org>,
"will@...nel.org" <will@...nel.org>, "joel@...lfernandes.org"
<joel@...lfernandes.org>
Subject: Re: [PATCH 6/9] lockdep: Introduce wait-type checks
Hello Peter and all,
I'm trying to understand the splat I've got with v6.18-rc1:
=============================
[ BUG: Invalid wait context ]
6.18.0-rc1+git476309497a53 #1 Tainted: G O
-----------------------------
some-user-space-process/1251 is trying to lock:
ffff000005cbc548 (&counter->events_list_lock){....}-{3:3}, at: counter_push_event+0x68/0x430 [counter]
other info that might help us debug this:
context-{2:2}
no locks held by some-user-space-process/1251.
stack backtrace:
CPU: 0 UID: 0 PID: 1251 Comm: some-user-space-process Tainted: G O 6.18.0-rc1+git476309497a53 #1 PREEMPT
Call trace:
show_stack+0x20/0x38 (C)
dump_stack_lvl+0x8c/0xd0
dump_stack+0x18/0x28
__lock_acquire+0x91c/0x1f78
lock_acquire+0x1c4/0x338
_raw_spin_lock_irqsave+0x68/0x98
counter_push_event+0x68/0x430 [counter]
interrupt_cnt_isr+0x40/0x78 [interrupt_cnt]
__handle_irq_event_percpu+0xa4/0x398
handle_irq_event+0x54/0xb8
handle_simple_irq+0xe4/0x128
handle_irq_desc+0x48/0x68
generic_handle_domain_irq+0x24/0x38
gpio_irq_handler+0xa0/0x140
handle_irq_desc+0x48/0x68
generic_handle_domain_irq+0x24/0x38
gic_handle_irq+0x54/0x128
call_on_irq_stack+0x30/0x70
do_interrupt_handler+0x88/0x98
el0_interrupt+0x5c/0x270
__el0_irq_handler_common+0x18/0x28
el0t_64_irq_handler+0x10/0x20
el0t_64_irq+0x198/0x1a0
This is not PREEMPT_RT, but still CONFIG_PROVE_RAW_LOCK_NESTING=y config...
The relevant logic in the current v6.18-rcX is still largely unchanged
from the Subject commit and...
On Fri, 2020-03-13 at 18:46 +0100, Sebastian Andrzej Siewior wrote:
> From: Peter Zijlstra <peterz@...radead.org>
>
> Extend lockdep to validate lock wait-type context.
>
> The current wait-types are:
>
> LD_WAIT_FREE, /* wait free, rcu etc.. */
> LD_WAIT_SPIN, /* spin loops, raw_spinlock_t etc.. */
> LD_WAIT_CONFIG, /* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */
> LD_WAIT_SLEEP, /* sleeping locks, mutex_t etc.. */
>
> Where lockdep validates that the current lock (the one being acquired)
> fits in the current wait-context (as generated by the held stack).
>
> This ensures that there is no attempt to acquire mutexes while holding
> spinlocks, to acquire spinlocks while holding raw_spinlocks and so on. In
> other words, its a more fancy might_sleep().
>
> Obviously RCU made the entire ordeal more complex than a simple single
> value test because RCU can be acquired in (pretty much) any context and
> while it presents a context to nested locks it is not the same as it
> got acquired in.
>
> Therefore its necessary to split the wait_type into two values, one
> representing the acquire (outer) and one representing the nested context
> (inner). For most 'normal' locks these two are the same.
>
> [ To make static initialization easier we have the rule that:
> .outer == INV means .outer == .inner; because INV == 0. ]
>
> It further means that its required to find the minimal .inner of the held
> stack to compare against the outer of the new lock; because while 'normal'
> RCU presents a CONFIG type to nested locks, if it is taken while already
> holding a SPIN type it obviously doesn't relax the rules.
>
> Below is an example output generated by the trivial test code:
>
> raw_spin_lock(&foo);
> spin_lock(&bar);
> spin_unlock(&bar);
> raw_spin_unlock(&foo);
>
> [ BUG: Invalid wait context ]
> -----------------------------
> swapper/0/1 is trying to lock:
> ffffc90000013f20 (&bar){....}-{3:3}, at: kernel_init+0xdb/0x187
> other info that might help us debug this:
> 1 lock held by swapper/0/1:
> #0: ffffc90000013ee0 (&foo){+.+.}-{2:2}, at: kernel_init+0xd1/0x187
>
> The way to read it is to look at the new -{n,m} part in the lock
> description; -{3:3} for the attempted lock, and try and match that up to
> the held locks, which in this case is the one: -{2,2}.
>
> This tells that the acquiring lock requires a more relaxed environment than
> presented by the lock stack.
>
> Currently only the normal locks and RCU are converted, the rest of the
> lockdep users defaults to .inner = INV which is ignored. More conversions
> can be done when desired.
>
> The check for spinlock_t nesting is not enabled by default. It's a separate
> config option for now as there are known problems which are currently
> addressed. The config option allows to identify these problems and to
> verify that the solutions found are indeed solving them.
>
> The config switch will be removed and the checks will permanently enabled
> once the vast majority of issues has been addressed.
>
> [ bigeasy: Move LD_WAIT_FREE,… out of CONFIG_LOCKDEP to avoid compile
> failure with CONFIG_DEBUG_SPINLOCK + !CONFIG_LOCKDEP]
> [ tglx: Add the config option ]
>
> Requested-by: Thomas Gleixner <tglx@...utronix.de>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
[]
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 32406ef0d6a2d..4a9abf8bd9d15 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -3798,6 +3806,113 @@ print_lock_nested_lock_not_held(struct task_struct *curr,
> dump_stack();
> }
>
> +static int
> +print_lock_invalid_wait_context(struct task_struct *curr,
> + struct held_lock *hlock)
> +{
> + if (!debug_locks_off())
> + return 0;
> + if (debug_locks_silent)
> + return 0;
> +
> + pr_warn("\n");
> + pr_warn("=============================\n");
> + pr_warn("[ BUG: Invalid wait context ]\n");
> + print_kernel_ident();
> + pr_warn("-----------------------------\n");
> +
> + pr_warn("%s/%d is trying to lock:\n", curr->comm, task_pid_nr(curr));
> + print_lock(hlock);
> +
> + pr_warn("other info that might help us debug this:\n");
> + lockdep_print_held_locks(curr);
> +
> + pr_warn("stack backtrace:\n");
> + dump_stack();
> +
> + return 0;
> +}
> +
> +/*
> + * Verify the wait_type context.
> + *
> + * This check validates we takes locks in the right wait-type order; that is it
> + * ensures that we do not take mutexes inside spinlocks and do not attempt to
> + * acquire spinlocks inside raw_spinlocks and the sort.
> + *
> + * The entire thing is slightly more complex because of RCU, RCU is a lock that
> + * can be taken from (pretty much) any context but also has constraints.
> + * However when taken in a stricter environment the RCU lock does not loosen
> + * the constraints.
> + *
> + * Therefore we must look for the strictest environment in the lock stack and
> + * compare that to the lock we're trying to acquire.
> + */
> +static int check_wait_context(struct task_struct *curr, struct held_lock *next)
> +{
> + short next_inner = hlock_class(next)->wait_type_inner;
> + short next_outer = hlock_class(next)->wait_type_outer;
> + short curr_inner;
> + int depth;
> +
> + if (!curr->lockdep_depth || !next_inner || next->trylock)
> + return 0;
> +
> + if (!next_outer)
> + next_outer = next_inner;
> +
> + /*
> + * Find start of current irq_context..
> + */
> + for (depth = curr->lockdep_depth - 1; depth >= 0; depth--) {
> + struct held_lock *prev = curr->held_locks + depth;
> + if (prev->irq_context != next->irq_context)
> + break;
> + }
> + depth++;
> +
> + /*
> + * Set appropriate wait type for the context; for IRQs we have to take
> + * into account force_irqthread as that is implied by PREEMPT_RT.
> + */
> + if (curr->hardirq_context) {
> + /*
> + * Check if force_irqthreads will run us threaded.
> + */
> + if (curr->hardirq_threaded)
> + curr_inner = LD_WAIT_CONFIG;
> + else
> + curr_inner = LD_WAIT_SPIN;
... even though it's "if (curr->hardirq_threaded || curr->irq_config)" in the
meanwhile, it is unclear to me, why is it conditional at all and not always
"LD_WAIT_CONFIG" in hardirq context?
With !PREEMPT_RT but CONFIG_PROVE_RAW_LOCK_NESTING=y we always end up
with LD_WAIT_SPIN here and it always warns if we take "spinlock_t" in hardirq
context (as in the splat above). But the latter should be legal under all
circumstances, isn't it?
> + } else if (curr->softirq_context) {
> + /*
> + * Softirqs are always threaded.
> + */
> + curr_inner = LD_WAIT_CONFIG;
> + } else {
> + curr_inner = LD_WAIT_MAX;
> + }
> +
> + for (; depth < curr->lockdep_depth; depth++) {
> + struct held_lock *prev = curr->held_locks + depth;
> + short prev_inner = hlock_class(prev)->wait_type_inner;
> +
> + if (prev_inner) {
> + /*
> + * We can have a bigger inner than a previous one
> + * when outer is smaller than inner, as with RCU.
> + *
> + * Also due to trylocks.
> + */
> + curr_inner = min(curr_inner, prev_inner);
> + }
> + }
> +
> + if (next_outer > curr_inner)
> + return print_lock_invalid_wait_context(curr, next);
> +
> + return 0;
> +}
> +
> static int __lock_is_held(const struct lockdep_map *lock, int read);
>
> /*
--
Alexander Sverdlin
Siemens AG
www.siemens.com
Powered by blists - more mailing lists