lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250403211514.985900-8-paulmck@kernel.org>
Date: Thu,  3 Apr 2025 14:15:13 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: linux-kernel@...r.kernel.org
Cc: kernel-team@...a.com,
	"Paul E. McKenney" <paulmck@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Kuniyuki Iwashima <kuniyu@...zon.com>,
	Mateusz Guzik <mjguzik@...il.com>,
	Petr Mladek <pmladek@...e.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	John Ogness <john.ogness@...utronix.de>,
	Sergey Senozhatsky <senozhatsky@...omium.org>,
	Jon Pan-Doh <pandoh@...gle.com>,
	Bjorn Helgaas <bhelgaas@...gle.com>,
	Karolina Stolarek <karolina.stolarek@...cle.com>
Subject: [PATCH RFC 8/9] ratelimit: Reduce ratelimit's false-positive misses

The current ratelimit implementation can suffer from false-positive
misses.  That is, ___ratelimit() might return zero (causing the caller
to invoke rate limiting, for example, by dropping printk()s) even when
the current burst has not yet been consumed.  This happens when one CPU
holds a given ratelimit structure's lock and some other CPU concurrently
invokes ___ratelimit().  The fact that the lock is a raw irq-disabled
spinlock might make low-contention trylock failure seem unlikely, but
vCPU preemption, NMIs, and firmware interrupts can greatly extend the
trylock-failure window.

Avoiding these false-positive misses is especially important when
correlating console logged hardware failures with other information.

Therefore, instead of attempting to acquire the lock on each call to
___ratelimit(), construct a lockless fastpath and only acquire the lock
when retriggering (for the next burst) or when resynchronizing (due to
either a long idle period or due to ratelimiting having been disabled).
This reduces the number of lock-hold periods that can be extended
by vCPU preemption, NMIs and firmware interrupts, but also means that
these extensions must be of much longer durations (generally moving from
milliseconds to seconds) before they can result in false-positive drops.

In addition, the lockless fastpath gets a 10-20% speedup compared to
the old fully locked code on my x86 laptop.  Your mileage will of course
vary depending on your hardware, workload, and configuration.

[ paulmck: Apply feedback from Dan Carpenter. ]

Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@...zon.com>
Cc: Mateusz Guzik <mjguzik@...il.com>
Cc: Petr Mladek <pmladek@...e.com>
Cc: Steven Rostedt <rostedt@...dmis.org>
Cc: John Ogness <john.ogness@...utronix.de>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>
Cc: Jon Pan-Doh <pandoh@...gle.com>
Cc: Bjorn Helgaas <bhelgaas@...gle.com>
Cc: Karolina Stolarek <karolina.stolarek@...cle.com>
---
 include/linux/ratelimit.h       |   2 +-
 include/linux/ratelimit_types.h |   3 +-
 lib/ratelimit.c                 | 176 +++++++++++++++++++++++++-------
 3 files changed, 144 insertions(+), 37 deletions(-)

diff --git a/include/linux/ratelimit.h b/include/linux/ratelimit.h
index adfec24061d16..7aaad158ee373 100644
--- a/include/linux/ratelimit.h
+++ b/include/linux/ratelimit.h
@@ -44,7 +44,7 @@ static inline void ratelimit_state_reset_interval(struct ratelimit_state *rs, in
 	raw_spin_lock_irqsave(&rs->lock, flags);
 	rs->interval = interval_init;
 	rs->flags &= ~RATELIMIT_INITIALIZED;
-	rs->printed = 0;
+	atomic_set(&rs->rs_n_left, rs->burst);
 	ratelimit_state_reset_miss(rs);
 	raw_spin_unlock_irqrestore(&rs->lock, flags);
 }
diff --git a/include/linux/ratelimit_types.h b/include/linux/ratelimit_types.h
index ef6711b6b229f..2f38345ffc1a5 100644
--- a/include/linux/ratelimit_types.h
+++ b/include/linux/ratelimit_types.h
@@ -18,7 +18,7 @@ struct ratelimit_state {
 
 	int		interval;
 	int		burst;
-	int		printed;
+	atomic_t	rs_n_left;
 	atomic_t	missed;
 	unsigned int	flags;
 	unsigned long	begin;
@@ -28,6 +28,7 @@ struct ratelimit_state {
 		.lock		= __RAW_SPIN_LOCK_UNLOCKED(name.lock),		  \
 		.interval	= interval_init,				  \
 		.burst		= burst_init,					  \
+		.rs_n_left	= ATOMIC_INIT(burst_init),			  \
 		.flags		= flags_init,					  \
 	}
 
diff --git a/lib/ratelimit.c b/lib/ratelimit.c
index bd6e3b429e333..03704c6f8899e 100644
--- a/lib/ratelimit.c
+++ b/lib/ratelimit.c
@@ -26,55 +26,161 @@
  */
 int ___ratelimit(struct ratelimit_state *rs, const char *func)
 {
-	/* Paired with WRITE_ONCE() in .proc_handler().
-	 * Changing two values seperately could be inconsistent
-	 * and some message could be lost.  (See: net_ratelimit_state).
-	 */
-	int interval = READ_ONCE(rs->interval);
+	unsigned long begin;
 	int burst = READ_ONCE(rs->burst);
+	int delta = 0;
 	unsigned long flags;
-	int ret;
-
-	if (!interval)
-		return 1;
+	bool gotlock = false;
+	bool initialized;
+	int interval = READ_ONCE(rs->interval);
+	unsigned long j;
+	int n_left;
 
 	/*
-	 * If we contend on this state's lock then almost
-	 * by definition we are too busy to print a message,
-	 * in addition to the one that will be printed by
-	 * the entity that is holding the lock already:
+	 * If the burst or interval settings mark this ratelimit_state
+	 * structure as disabled, then clear the RATELIMIT_INITIALIZED bit
+	 * in ->flags to force resetting of the ratelimiting interval when
+	 * this ratelimit_state structure is next re-enabled.
 	 */
-	if (!raw_spin_trylock_irqsave(&rs->lock, flags)) {
-		ratelimit_state_inc_miss(rs);
-		return 0;
+	if (burst <= 0 || interval <= 0) {
+		if ((READ_ONCE(rs->flags) & RATELIMIT_INITIALIZED) &&
+		    raw_spin_trylock_irqsave(&rs->lock, flags)) {
+			if (READ_ONCE(rs->flags) & RATELIMIT_INITIALIZED)
+				smp_store_release(&rs->flags, rs->flags & ~RATELIMIT_INITIALIZED);
+			raw_spin_unlock_irqrestore(&rs->lock, flags);
+		}
+		return true;
 	}
 
-	if (!(rs->flags & RATELIMIT_INITIALIZED)) {
-		rs->begin = jiffies;
-		rs->flags |= RATELIMIT_INITIALIZED;
+	/*
+	 * If this structure has just now been ratelimited, but not yet
+	 * reset for the next rate-limiting interval, take an early and
+	 * low-cost exit.
+	 */
+	if (atomic_read_acquire(&rs->rs_n_left) <= 0) /* Pair with release. */
+		goto limited;
+
+	/*
+	 * If this structure is marked as initialized and has been
+	 * recently used, pick up its ->begin field.  Otherwise, pick up
+	 * the current time and attempt to re-initialized the structure.
+	 */
+	j = jiffies;
+	initialized = smp_load_acquire(&rs->flags) & RATELIMIT_INITIALIZED; /* Pair with release. */
+	if (initialized) {
+		begin = READ_ONCE(rs->begin);
+	} else {
+		/*
+		 * Uninitialized or long idle, so reset ->begin and
+		 * mark initialized.  If we fail to acquire the lock,
+		 * let the lock holder do the work.
+		 */
+		begin = j;
+		if (raw_spin_trylock_irqsave(&rs->lock, flags)) {
+			if (!(READ_ONCE(rs->flags) & RATELIMIT_INITIALIZED)) {
+				begin = jiffies;
+				j = begin;
+				WRITE_ONCE(rs->begin, begin);
+				smp_store_release(&rs->flags, /* Pair with acquire. */
+						  rs->flags | RATELIMIT_INITIALIZED);
+				initialized = true;
+			}
+			raw_spin_unlock_irqrestore(&rs->lock, flags);
+		}
 	}
 
-	if (time_is_before_jiffies(rs->begin + interval)) {
-		int m = ratelimit_state_reset_miss(rs);
+	/*
+	 * If this structure is still in the interval in which has
+	 * already hit the rate limit, take an early and low-cost exit.
+	 */
+	if (initialized && time_before(begin - 2 * interval, j) && time_before(j, begin))
+		goto limited;
 
-		if (m) {
-			if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE)) {
-				printk_deferred(KERN_WARNING
-						"%s: %d callbacks suppressed\n", func, m);
-			}
+	/*
+	 * Register another request, and take an early (but not low-cost)
+	 * exit if rate-limiting just nowcame into effect.
+	 */
+	n_left = atomic_dec_return(&rs->rs_n_left);
+	if (n_left < 0)
+		goto limited; /* Just now started ratelimiting. */
+	if (n_left > 0) {
+		/*
+		 * Otherwise, there is not yet any rate limiting for the
+		 * current interval, and furthermore there is at least one
+		 * last count remaining.  But check to see if initialization
+		 * is required or if we have run off the end of the interval
+		 * without rate limiting having been imposed.  Either way,
+		 * we eventually return @true to tell our caller to go ahead.
+		 */
+		if (initialized &&
+		    time_before(begin - interval, j) && time_before(j, begin + interval))
+			return true;  /* Nothing special to do. */
+		if (!raw_spin_trylock_irqsave(&rs->lock, flags))
+			return true; /* Let lock holder do special work. */
+		interval = READ_ONCE(rs->interval);
+		begin = rs->begin;
+		initialized = smp_load_acquire(&rs->flags) & RATELIMIT_INITIALIZED;
+		if (interval <= 0 ||
+		    (initialized &&
+		     time_before(begin - interval, j) && time_before(j, begin + interval))) {
+			/*
+			 * Someone else beat us to the special work,
+			 * so release the lock and return.
+			 */
+			raw_spin_unlock_irqrestore(&rs->lock, flags);
+			return true;
 		}
-		rs->begin   = jiffies;
-		rs->printed = 0;
+
+		/* We have the lock and will do initialization. */
+		gotlock = true;
+		delta = -1;
 	}
-	if (burst && burst > rs->printed) {
-		rs->printed++;
-		ret = 1;
-	} else {
-		ratelimit_state_inc_miss(rs);
-		ret = 0;
+	if (!gotlock) {
+		/*
+		 * We get here if we got the last count (n_left == 0),
+		 * so that rate limiting is in effect for the next caller.
+		 * We will return @true to tell our caller to go ahead,
+		 * but first we acquire the lock and set things up for
+		 * the next rate-limiting interval.
+		 */
+		raw_spin_lock_irqsave(&rs->lock, flags);
+		interval = READ_ONCE(rs->interval);
+		j = jiffies;
+		begin = rs->begin;
+		initialized = smp_load_acquire(&rs->flags) & RATELIMIT_INITIALIZED;
+	}
+	burst = READ_ONCE(rs->burst);
+	if (interval <= 0 || !initialized ||
+	    time_after(j, begin + interval) || time_after(begin - interval, j))
+		begin = j; /* Long delay, reset interval. */
+	else
+		begin += interval; /* Next interval. */
+
+	/*
+	 * If an acquire sees the value stored by either of these two
+	 * store-release operations, it will also see the value from
+	 * following store to ->begin, or from some later store.  But not
+	 * from any earlier now-obsolete earlier store to ->begin.
+	 */
+	WRITE_ONCE(rs->begin, begin);
+	atomic_set_release(&rs->rs_n_left, burst + delta); /* Pair with acquire.*/
+	smp_store_release(&rs->flags, rs->flags | RATELIMIT_INITIALIZED); /* ^^^ */
+
+	/* Print suppressed callback count if requested. */
+	if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE)) {
+		delta = ratelimit_state_reset_miss(rs);
+		if (delta)
+			printk_deferred(KERN_WARNING "%s: %d callbacks suppressed\n", func, delta);
 	}
 	raw_spin_unlock_irqrestore(&rs->lock, flags);
+	return true;
 
-	return ret;
+limited:
+	/*
+	 * Count the number of rate-limited requests and tell the caller
+	 * that this is a no-go.
+	 */
+	ratelimit_state_inc_miss(rs);
+	return false;
 }
 EXPORT_SYMBOL(___ratelimit);
-- 
2.40.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ