linux-kernel - [PATCH diagnostic qspinlock] Diagnostics for excessive lock-drop wait loop time

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230112003627.GA3133092@paulmck-ThinkPad-P17-Gen-1>
Date:   Wed, 11 Jan 2023 16:36:27 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     riel@...riel.com, davej@...emonkey.org.uk
Cc:     linux-kernel@...r.kernel.org, kernel-team@...a.com
Subject: [PATCH diagnostic qspinlock] Diagnostics for excessive lock-drop
 wait loop time

We see systems stuck in the queued_spin_lock_slowpath() loop that waits
for the lock to become unlocked in the case where the current CPU has
set pending state.  Therefore, this not-for-mainline commit gives a warning
that includes the lock word state if the loop has been spinning for more
than 10 seconds.  It also adds a WARN_ON_ONCE() that complains if the
lock is not in pending state.

If this is to be placed in production, some reporting mechanism not
involving spinlocks is likely needed, for example, BPF, trace events,
or some combination thereof.

Signed-off-by: Paul E. McKenney <paulmck@...nel.org>

diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index ac5a3e6d3b564..be1440782c4b3 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -379,8 +379,22 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 	 * clear_pending_set_locked() implementations imply full
 	 * barriers.
 	 */
-	if (val & _Q_LOCKED_MASK)
-		atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_MASK));
+	if (val & _Q_LOCKED_MASK) {
+		int cnt = _Q_PENDING_LOOPS;
+		unsigned long j = jiffies + 10 * HZ;
+		struct qspinlock qval;
+		int val;
+
+		for (;;) {
+			val = atomic_read_acquire(&lock->val);
+			atomic_set(&qval.val, val);
+			WARN_ON_ONCE(!(val & _Q_PENDING_VAL));
+			if (!(val & _Q_LOCKED_MASK))
+				break;
+			if (!--cnt && !WARN(time_after(jiffies, j), "%s: Still pending and locked: %#x (%c%c%#x)\n", __func__, val, ".L"[!!qval.locked], ".P"[!!qval.pending], qval.tail))
+				cnt = _Q_PENDING_LOOPS;
+		}
+	}

 	/*
 	 * take ownership and clear the pending bit.