lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251119210820.2959128-1-mjguzik@gmail.com>
Date: Wed, 19 Nov 2025 22:08:20 +0100
From: Mateusz Guzik <mjguzik@...il.com>
To: dennis@...nel.org
Cc: akpm@...ux-foundation.org,
	linux-kernel@...r.kernel.org,
	linux-mm@...ck.org,
	Mateusz Guzik <mjguzik@...il.com>
Subject: [PATCH] percpu_counter: reduce i-cache footprint of percpu_counter_add_batch() fast path

When compiled with gcc 14.2 for the x86-64 architecture with ORC frame
unwinder the fast path still has the most unfortunate size of 66 bytes,
in part from register spilling to falicitate the fallback.

Moving it out solves the problem by keeping it just below 64 bytes.

Signed-off-by: Mateusz Guzik <mjguzik@...il.com>
---
 lib/percpu_counter.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
index 2891f94a11c6..0cf6f1101903 100644
--- a/lib/percpu_counter.c
+++ b/lib/percpu_counter.c
@@ -89,24 +89,34 @@ EXPORT_SYMBOL(percpu_counter_set);
  * Safety against interrupts is achieved in 2 ways:
  * 1. the fast path uses local cmpxchg (note: no lock prefix)
  * 2. the slow path operates with interrupts disabled
+ *
+ * Slowpath is implemented as a separate routine to reduce register spillage by gcc.
  */
-void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch)
+static void noinline percpu_counter_add_batch_slowpath(struct percpu_counter *fbc,
+						       s64 amount, s32 batch)
 {
 	s64 count;
 	unsigned long flags;
 
+	raw_spin_lock_irqsave(&fbc->lock, flags);
+	/*
+	 * Note: by now we might have migrated to another CPU or the value
+	 * might have changed.
+	 */
+	count = __this_cpu_read(*fbc->counters);
+	fbc->count += count + amount;
+	__this_cpu_sub(*fbc->counters, count);
+	raw_spin_unlock_irqrestore(&fbc->lock, flags);
+}
+
+void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch)
+{
+	s64 count;
+
 	count = this_cpu_read(*fbc->counters);
 	do {
 		if (unlikely(abs(count + amount) >= batch)) {
-			raw_spin_lock_irqsave(&fbc->lock, flags);
-			/*
-			 * Note: by now we might have migrated to another CPU
-			 * or the value might have changed.
-			 */
-			count = __this_cpu_read(*fbc->counters);
-			fbc->count += count + amount;
-			__this_cpu_sub(*fbc->counters, count);
-			raw_spin_unlock_irqrestore(&fbc->lock, flags);
+			percpu_counter_add_batch_slowpath(fbc, amount, batch);
 			return;
 		}
 	} while (!this_cpu_try_cmpxchg(*fbc->counters, &count, count + amount));
-- 
2.48.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ