linux-kernel - Re: [PATCH] sched: Fix race in rt_mutex_pre_schedule by removing non-atomic fetch_and

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20250827081750.3606616-1-cuiguoqi@kylinos.cn>
Date: Wed, 27 Aug 2025 16:17:50 +0800
From: cuiguoqi <cuiguoqi@...inos.cn>
To: rostedt@...dmis.org
Cc: bigeasy@...utronix.de,
	bsegall@...gle.com,
	clrkwllms@...nel.org,
	cuiguoqi@...inos.cn,
	dietmar.eggemann@....com,
	guoqi0226@....com,
	juri.lelli@...hat.com,
	linux-kernel@...r.kernel.org,
	linux-rt-devel@...ts.linux.dev,
	mgorman@...e.de,
	mingo@...hat.com,
	peterz@...radead.org,
	vincent.guittot@...aro.org,
	vschneid@...hat.com
Subject: Re: [PATCH] sched: Fix race in rt_mutex_pre_schedule by removing non-atomic fetch_and_set

The issue arises during EDEADLK testing in `lib/locking-selftest.c` when `is_wait_die=1`.

In this mode, the current thread's `debug_locks` flag is disabled via `__debug_locks_off` (which calls `xchg(&debug_locks, 0)`) during the blocking path of `rt_mutex_slowlock`, specifically in `rt_mutex_slowlock_block()`:

  rt_mutex_slowlock()
    rt_mutex_pre_schedule()
      rt_mutex_slowlock_block()
        DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock)
          __debug_locks_off();  // xchg(&debug_locks, 0)

However, `rt_mutex_post_schedule()` still performs:

  lockdep_assert(fetch_and_set(current->sched_rt_mutex, 0));

Which expands to:

  do {
      WARN_ON(debug_locks && !( ({ int _x = current->sched_rt_mutex; current->sched_rt_mutex = 0; _x; }) ));
  } while (0)

The generated assembly shows that the entire assertion is conditional on `debug_locks`:

  adrp    x0, debug_locks
  ldr     w0, [x0]
  cbz     w0, .label_skip_warn   // Skip WARN if debug_locks == 0

This means: if `debug_locks` was cleared earlier, the check on `current->sched_rt_mutex` is effectively skipped, and the flag may remain set.

Later, when the same task re-enters `rt_mutex_slowlock`, it calls `lockdep_reset()` to re-enable `debug_locks`, but the stale `current->sched_rt_mutex` state (left over from the previous lock attempt) causes a false-positive warning in `rt_mutex_pre_schedule()`:

  WARNING: CPU: 0 PID: 0 at kernel/sched/core.c:7085 rt_mutex_pre_schedule+0xa8/0x108

Because:
  - `rt_mutex_pre_schedule()` asserts `!current->sched_rt_mutex`
  - But the flag was never properly cleared due to the skipped post-schedule check.

This is not a data race on the flag itself, but a **state inconsistency caused by conditional debugging logic** — the `fetch_and_set` macro is not atomic, but more importantly, the assertion is bypassed when `debug_locks` is off, breaking the expected state transition.