linux-kernel - Re: [PATCH] sched: Fix race in rt_mutex_pre_schedule by removing non-atomic fetch_and

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20251006190739.GZ3245006@noisy.programming.kicks-ass.net>
Date: Mon, 6 Oct 2025 21:07:39 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: cuiguoqi <cuiguoqi@...inos.cn>
Cc: rostedt@...dmis.org, bigeasy@...utronix.de, bsegall@...gle.com,
	clrkwllms@...nel.org, dietmar.eggemann@....com, guoqi0226@....com,
	juri.lelli@...hat.com, linux-kernel@...r.kernel.org,
	linux-rt-devel@...ts.linux.dev, mgorman@...e.de, mingo@...hat.com,
	vincent.guittot@...aro.org, vschneid@...hat.com
Subject: Re: [PATCH] sched: Fix race in rt_mutex_pre_schedule by removing
 non-atomic fetch_and_set

On Wed, Aug 27, 2025 at 04:17:50PM +0800, cuiguoqi wrote:
> The issue arises during EDEADLK testing in `lib/locking-selftest.c` when `is_wait_die=1`.
> 
> In this mode, the current thread's `debug_locks` flag is disabled via `__debug_locks_off` (which calls `xchg(&debug_locks, 0)`) during the blocking path of `rt_mutex_slowlock`, specifically in `rt_mutex_slowlock_block()`:
> 
>   rt_mutex_slowlock()
>     rt_mutex_pre_schedule()
>       rt_mutex_slowlock_block()
>         DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock)
>           __debug_locks_off();  // xchg(&debug_locks, 0)
> 
> However, `rt_mutex_post_schedule()` still performs:
> 
>   lockdep_assert(fetch_and_set(current->sched_rt_mutex, 0));
> 
> Which expands to:
> 
>   do {
>       WARN_ON(debug_locks && !( ({ int _x = current->sched_rt_mutex; current->sched_rt_mutex = 0; _x; }) ));
>   } while (0)
> 
> The generated assembly shows that the entire assertion is conditional on `debug_locks`:
> 
>   adrp    x0, debug_locks
>   ldr     w0, [x0]
>   cbz     w0, .label_skip_warn   // Skip WARN if debug_locks == 0
> 
> This means: if `debug_locks` was cleared earlier, the check on `current->sched_rt_mutex` is effectively skipped, and the flag may remain set.
> 
> Later, when the same task re-enters `rt_mutex_slowlock`, it calls `lockdep_reset()` to re-enable `debug_locks`, but the stale `current->sched_rt_mutex` state (left over from the previous lock attempt) causes a false-positive warning in `rt_mutex_pre_schedule()`:
> 
>   WARNING: CPU: 0 PID: 0 at kernel/sched/core.c:7085 rt_mutex_pre_schedule+0xa8/0x108
> 
> Because:
>   - `rt_mutex_pre_schedule()` asserts `!current->sched_rt_mutex`
>   - But the flag was never properly cleared due to the skipped post-schedule check.
> 
> This is not a data race on the flag itself, but a **state inconsistency caused by conditional debugging logic** — the `fetch_and_set` macro is not atomic, but more importantly, the assertion is bypassed when `debug_locks` is off, breaking the expected state transition.

Yeah, I can't really make myself care too much. This means you've
already had errors before -- resulting in debug_locks getting cleared.
Fix those and this problem goes away.

debug_locks is inherently racy; I don't see value in trying to fix all
that.