[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1299251348.2676.16.camel@edumazet-laptop>
Date: Fri, 04 Mar 2011 16:09:08 +0100
From: Eric Dumazet <eric.dumazet@...il.com>
To: David Miller <davem@...emloft.net>
Cc: xiaosuo@...il.com, netdev@...r.kernel.org
Subject: [PATCH net-next-2.6] inetpeer: seqlock optimization
Le jeudi 03 mars 2011 à 00:32 -0800, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@...il.com>
> Date: Thu, 03 Mar 2011 08:39:37 +0100
>
> > Le mercredi 02 mars 2011 à 22:42 -0800, David Miller a écrit :
> >> Actually, back to the original topic, I wonder how bad it is to simply
> >> elide the recheck in the create==0 case anyways. Except for the ipv4
> >> fragmentation wraparound protection values, perfect inetpeer finding
> >> is not necessary for correctness. And IPv4 fragmentation always calls
> >> inetpeer with create!=0.
> >
> > We could use a seqlock, to detect that a writer might have changed
> > things while we did our RCU lookup ?
>
> That would certainly work.
Here is a patch to implement this idea.
Thanks !
[PATCH net-next-2.6] inetpeer: seqlock optimization
David noticed :
------------------
Eric, I was profiling the non-routing-cache case and something that
stuck out is the case of calling inet_getpeer() with create==0.
If an entry is not found, we have to redo the lookup under a spinlock
to make certain that a concurrent writer rebalancing the tree does
not "hide" an existing entry from us.
This makes the case of a create==0 lookup for a not-present entry
really expensive. It is on the order of 600 cpu cycles on my
Niagara2.
I added a hack to not do the relookup under the lock when create==0
and it now costs less than 300 cycles.
This is now a pretty common operation with the way we handle COW'd
metrics, so I think it's definitely worth optimizing.
-----------------
One solution is to use a seqlock instead of a spinlock to protect struct
inet_peer_base.
After a failed avl tree lookup, we can easily detect if a writer did
some changes during our lookup. Taking the lock and redo the lookup is
only necessary in this case.
Signed-off-by: Eric Dumazet <eric.dumazet@...il.com>
---
net/ipv4/inetpeer.c | 24 ++++++++++++++++--------
1 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c
index 48f8d45..7fd9fab 100644
--- a/net/ipv4/inetpeer.c
+++ b/net/ipv4/inetpeer.c
@@ -81,19 +81,19 @@ static const struct inet_peer peer_fake_node = {
struct inet_peer_base {
struct inet_peer __rcu *root;
- spinlock_t lock;
+ seqlock_t lock;
int total;
};
static struct inet_peer_base v4_peers = {
.root = peer_avl_empty_rcu,
- .lock = __SPIN_LOCK_UNLOCKED(v4_peers.lock),
+ .lock = __SEQLOCK_UNLOCKED(v4_peers.lock),
.total = 0,
};
static struct inet_peer_base v6_peers = {
.root = peer_avl_empty_rcu,
- .lock = __SPIN_LOCK_UNLOCKED(v6_peers.lock),
+ .lock = __SEQLOCK_UNLOCKED(v6_peers.lock),
.total = 0,
};
@@ -372,7 +372,7 @@ static void unlink_from_pool(struct inet_peer *p, struct inet_peer_base *base)
do_free = 0;
- spin_lock_bh(&base->lock);
+ write_seqlock_bh(&base->lock);
/* Check the reference counter. It was artificially incremented by 1
* in cleanup() function to prevent sudden disappearing. If we can
* atomically (because of lockless readers) take this last reference,
@@ -409,7 +409,7 @@ static void unlink_from_pool(struct inet_peer *p, struct inet_peer_base *base)
base->total--;
do_free = 1;
}
- spin_unlock_bh(&base->lock);
+ write_sequnlock_bh(&base->lock);
if (do_free)
call_rcu_bh(&p->rcu, inetpeer_free_rcu);
@@ -477,12 +477,16 @@ struct inet_peer *inet_getpeer(struct inetpeer_addr *daddr, int create)
struct inet_peer __rcu **stack[PEER_MAXDEPTH], ***stackptr;
struct inet_peer_base *base = family_to_base(daddr->family);
struct inet_peer *p;
+ unsigned int sequence;
+ int invalidated;
/* Look up for the address quickly, lockless.
* Because of a concurrent writer, we might not find an existing entry.
*/
rcu_read_lock_bh();
+ sequence = read_seqbegin(&base->lock);
p = lookup_rcu_bh(daddr, base);
+ invalidated = read_seqretry(&base->lock, sequence);
rcu_read_unlock_bh();
if (p) {
@@ -493,14 +497,18 @@ struct inet_peer *inet_getpeer(struct inetpeer_addr *daddr, int create)
return p;
}
+ /* If no writer did a change during our lookup, we can return early. */
+ if (!create && !invalidated)
+ return NULL;
+
/* retry an exact lookup, taking the lock before.
* At least, nodes should be hot in our cache.
*/
- spin_lock_bh(&base->lock);
+ write_seqlock_bh(&base->lock);
p = lookup(daddr, stack, base);
if (p != peer_avl_empty) {
atomic_inc(&p->refcnt);
- spin_unlock_bh(&base->lock);
+ write_sequnlock_bh(&base->lock);
/* Remove the entry from unused list if it was there. */
unlink_from_unused(p);
return p;
@@ -524,7 +532,7 @@ struct inet_peer *inet_getpeer(struct inetpeer_addr *daddr, int create)
link_to_pool(p, base);
base->total++;
}
- spin_unlock_bh(&base->lock);
+ write_sequnlock_bh(&base->lock);
if (base->total >= inet_peer_threshold)
/* Remove one less-recently-used entry. */
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists