netdev - [PATCH net-next-2.6] inetpeer: seqlock optimization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1299251348.2676.16.camel@edumazet-laptop>
Date:	Fri, 04 Mar 2011 16:09:08 +0100
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	David Miller <davem@...emloft.net>
Cc:	xiaosuo@...il.com, netdev@...r.kernel.org
Subject: [PATCH net-next-2.6] inetpeer: seqlock optimization

Le jeudi 03 mars 2011 à 00:32 -0800, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@...il.com>
> Date: Thu, 03 Mar 2011 08:39:37 +0100
> 
> > Le mercredi 02 mars 2011 à 22:42 -0800, David Miller a écrit :
> >> Actually, back to the original topic, I wonder how bad it is to simply
> >> elide the recheck in the create==0 case anyways.  Except for the ipv4
> >> fragmentation wraparound protection values, perfect inetpeer finding
> >> is not necessary for correctness.  And IPv4 fragmentation always calls
> >> inetpeer with create!=0.
> > 
> > We could use a seqlock, to detect that a writer might have changed
> > things while we did our RCU lookup ?
> 
> That would certainly work.

Here is a patch to implement this idea.

Thanks !

[PATCH net-next-2.6] inetpeer: seqlock optimization

David noticed :

------------------
Eric, I was profiling the non-routing-cache case and something that
stuck out is the case of calling inet_getpeer() with create==0.

If an entry is not found, we have to redo the lookup under a spinlock
to make certain that a concurrent writer rebalancing the tree does
not "hide" an existing entry from us.

This makes the case of a create==0 lookup for a not-present entry
really expensive.  It is on the order of 600 cpu cycles on my
Niagara2.

I added a hack to not do the relookup under the lock when create==0
and it now costs less than 300 cycles.

This is now a pretty common operation with the way we handle COW'd
metrics, so I think it's definitely worth optimizing.
-----------------

One solution is to use a seqlock instead of a spinlock to protect struct
inet_peer_base.

After a failed avl tree lookup, we can easily detect if a writer did
some changes during our lookup. Taking the lock and redo the lookup is
only necessary in this case.

Signed-off-by: Eric Dumazet <eric.dumazet@...il.com>
---
 net/ipv4/inetpeer.c |   24 ++++++++++++++++--------
 1 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c
index 48f8d45..7fd9fab 100644
--- a/net/ipv4/inetpeer.c
+++ b/net/ipv4/inetpeer.c
@@ -81,19 +81,19 @@ static const struct inet_peer peer_fake_node = {
 
 struct inet_peer_base {
 	struct inet_peer __rcu *root;
-	spinlock_t	lock;
+	seqlock_t	lock;
 	int		total;
 };
 
 static struct inet_peer_base v4_peers = {
 	.root		= peer_avl_empty_rcu,
-	.lock		= __SPIN_LOCK_UNLOCKED(v4_peers.lock),
+	.lock		= __SEQLOCK_UNLOCKED(v4_peers.lock),
 	.total		= 0,
 };
 
 static struct inet_peer_base v6_peers = {
 	.root		= peer_avl_empty_rcu,
-	.lock		= __SPIN_LOCK_UNLOCKED(v6_peers.lock),
+	.lock		= __SEQLOCK_UNLOCKED(v6_peers.lock),
 	.total		= 0,
 };
 
@@ -372,7 +372,7 @@ static void unlink_from_pool(struct inet_peer *p, struct inet_peer_base *base)
 
 	do_free = 0;
 
-	spin_lock_bh(&base->lock);
+	write_seqlock_bh(&base->lock);
 	/* Check the reference counter.  It was artificially incremented by 1
 	 * in cleanup() function to prevent sudden disappearing.  If we can
 	 * atomically (because of lockless readers) take this last reference,
@@ -409,7 +409,7 @@ static void unlink_from_pool(struct inet_peer *p, struct inet_peer_base *base)
 		base->total--;
 		do_free = 1;
 	}
-	spin_unlock_bh(&base->lock);
+	write_sequnlock_bh(&base->lock);
 
 	if (do_free)
 		call_rcu_bh(&p->rcu, inetpeer_free_rcu);
@@ -477,12 +477,16 @@ struct inet_peer *inet_getpeer(struct inetpeer_addr *daddr, int create)
 	struct inet_peer __rcu **stack[PEER_MAXDEPTH], ***stackptr;
 	struct inet_peer_base *base = family_to_base(daddr->family);
 	struct inet_peer *p;
+	unsigned int sequence;
+	int invalidated;
 
 	/* Look up for the address quickly, lockless.
 	 * Because of a concurrent writer, we might not find an existing entry.
 	 */
 	rcu_read_lock_bh();
+	sequence = read_seqbegin(&base->lock);
 	p = lookup_rcu_bh(daddr, base);
+	invalidated = read_seqretry(&base->lock, sequence);
 	rcu_read_unlock_bh();
 
 	if (p) {
@@ -493,14 +497,18 @@ struct inet_peer *inet_getpeer(struct inetpeer_addr *daddr, int create)
 		return p;
 	}
 
+	/* If no writer did a change during our lookup, we can return early. */
+	if (!create && !invalidated)
+		return NULL;
+
 	/* retry an exact lookup, taking the lock before.
 	 * At least, nodes should be hot in our cache.
 	 */
-	spin_lock_bh(&base->lock);
+	write_seqlock_bh(&base->lock);
 	p = lookup(daddr, stack, base);
 	if (p != peer_avl_empty) {
 		atomic_inc(&p->refcnt);
-		spin_unlock_bh(&base->lock);
+		write_sequnlock_bh(&base->lock);
 		/* Remove the entry from unused list if it was there. */
 		unlink_from_unused(p);
 		return p;
@@ -524,7 +532,7 @@ struct inet_peer *inet_getpeer(struct inetpeer_addr *daddr, int create)
 		link_to_pool(p, base);
 		base->total++;
 	}
-	spin_unlock_bh(&base->lock);
+	write_sequnlock_bh(&base->lock);
 
 	if (base->total >= inet_peer_threshold)
 		/* Remove one less-recently-used entry. */


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html