linux-kernel - [PATCH 1/2] lockref: speculatively spin waiting for the lock to be released

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240613001215.648829-2-mjguzik@gmail.com>
Date: Thu, 13 Jun 2024 02:12:14 +0200
From: Mateusz Guzik <mjguzik@...il.com>
To: torvalds@...ux-foundation.org
Cc: brauner@...nel.org,
	viro@...iv.linux.org.uk,
	jack@...e.cz,
	linux-kernel@...r.kernel.org,
	linux-fsdevel@...r.kernel.org,
	Mateusz Guzik <mjguzik@...il.com>
Subject: [PATCH 1/2] lockref: speculatively spin waiting for the lock to be released

The usual inc/dec scheme is to try to do it with atomics if the lock is
not taken and fallback to locked operation otherwise.

If the lock is only transiently held routines could instead
speculatively wait and avoid the acquire altogether.

This has a minor benefit of slightly speeding up cases of the sort,
which do happen in real workloads.

More importantly this fixes a corner case where lockref consumers
degrade to locked operation and are unable to go back to atomics.

Consider a continuous stream of lockref ops from several threads and
another thread taking the lock for a brief moment. Any lockref user
which manages to observe the lock as taken is going to fallback to
locking it itself, but this increases the likelihood that more threads
executing the code will do the same. This eventually can culminate in
nobody being able to go back to atomics on the problematic lockref.

While this is not a state which can persist in a real workload for
anything but few calls, it very much can permanently degrade when
benchmarking and in consequence grossly disfigure results.

Here is an example of will-it-scale doing 20-way stat(2) on the same
file residing on tmpfs, reports once per second with number of ops
completed:
[snip several ok results]
min:417297 max:426249 total:8401766	<-- expected performance
min:219024 max:221764 total:4398413	<-- some locking started
min:62855 max:64949 total:1273712	<-- everyone degraded
min:62472 max:64873 total:1268733
min:62179 max:63843 total:1256375
[snip remaining bad results]

While I did not try to figure out who transiently took the lock (it was
something outside of the benchmark), I devised a trivial reproducer
which triggers the problem almost every time: merely issue "ls" of the
directory containing the tested file (in this case: "ls /tmp").

The problem does not persist with the fix below.

Signed-off-by: Mateusz Guzik <mjguzik@...il.com>
---
 lib/lockref.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 85 insertions(+)

diff --git a/lib/lockref.c b/lib/lockref.c
index 2afe4c5d8919..596b521bc1f1 100644
--- a/lib/lockref.c
+++ b/lib/lockref.c
@@ -26,10 +26,46 @@
 	}									\
 } while (0)
 
+/*
+ * Most routines only ever inc/dec the reference, but the lock may be
+ * transiently held forcing them to take it as well.
+ *
+ * Should the lock be taken for any reason (including outside of lockref),
+ * a steady stream of ref/unref requests may find itself unable to go back
+ * to lockless operation.
+ *
+ * Combat the problem by giving the routines a way to speculatively wait in
+ * hopes of avoiding having to take the lock.
+ *
+ * The spin count is limited to guarantee forward progress, although the
+ * value is arbitrarily chosen.
+ *
+ * Note this routine is only used if the lock was found to be taken.
+ */
+static inline bool lockref_trywait_unlocked(struct lockref *lockref)
+{
+	struct lockref old;
+	int retry = 100;
+
+	for (;;) {
+		cpu_relax();
+		old.lock_count = READ_ONCE(lockref->lock_count);
+		if (arch_spin_value_unlocked(old.lock.rlock.raw_lock))
+			return true;
+		if (!--retry)
+			return false;
+	}
+}
+
 #else
 
 #define CMPXCHG_LOOP(CODE, SUCCESS) do { } while (0)
 
+static inline bool lockref_trywait_unlocked(struct lockref *lockref)
+{
+	return false;
+}
+
 #endif
 
 /**
@@ -47,6 +83,14 @@ void lockref_get(struct lockref *lockref)
 		return;
 	);
 
+	if (lockref_trywait_unlocked(lockref)) {
+		CMPXCHG_LOOP(
+			new.count++;
+		,
+			return;
+		);
+	}
+
 	spin_lock(&lockref->lock);
 	lockref->count++;
 	spin_unlock(&lockref->lock);
@@ -70,6 +114,16 @@ int lockref_get_not_zero(struct lockref *lockref)
 		return 1;
 	);
 
+	if (lockref_trywait_unlocked(lockref)) {
+		CMPXCHG_LOOP(
+			new.count++;
+			if (old.count <= 0)
+				return 0;
+		,
+			return 1;
+		);
+	}
+
 	spin_lock(&lockref->lock);
 	retval = 0;
 	if (lockref->count > 0) {
@@ -98,6 +152,16 @@ int lockref_put_not_zero(struct lockref *lockref)
 		return 1;
 	);
 
+	if (lockref_trywait_unlocked(lockref)) {
+		CMPXCHG_LOOP(
+			new.count--;
+			if (old.count <= 1)
+				return 0;
+		,
+			return 1;
+		);
+	}
+
 	spin_lock(&lockref->lock);
 	retval = 0;
 	if (lockref->count > 1) {
@@ -125,6 +189,17 @@ int lockref_put_return(struct lockref *lockref)
 	,
 		return new.count;
 	);
+
+	if (lockref_trywait_unlocked(lockref)) {
+		CMPXCHG_LOOP(
+			new.count--;
+			if (old.count <= 0)
+				return -1;
+		,
+			return new.count;
+		);
+	}
+
 	return -1;
 }
 EXPORT_SYMBOL(lockref_put_return);
@@ -181,6 +256,16 @@ int lockref_get_not_dead(struct lockref *lockref)
 		return 1;
 	);
 
+	if (lockref_trywait_unlocked(lockref)) {
+		CMPXCHG_LOOP(
+			new.count++;
+			if (old.count < 0)
+				return 0;
+		,
+			return 1;
+		);
+	}
+
 	spin_lock(&lockref->lock);
 	retval = 0;
 	if (lockref->count >= 0) {
-- 
2.43.0