[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <490795FB.2000201@cosmosbay.com>
Date: Tue, 28 Oct 2008 23:45:15 +0100
From: Eric Dumazet <dada1@...mosbay.com>
To: David Miller <davem@...emloft.net>
CC: shemminger@...tta.com, benny+usenet@...rsen.dk, minyard@....org,
netdev@...r.kernel.org, paulmck@...ux.vnet.ibm.com,
Christoph Lameter <clameter@....com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Evgeniy Polyakov <johnpol@....mipt.ru>
Subject: Re: [PATCH 2/2] udp: RCU handling for Unicast packets.
Eric Dumazet a écrit :
> RCUification of UDP hash tables
>
> Goals are :
>
> 1) Optimizing handling of incoming Unicast UDP frames, so that no memory
> writes should happen in the fast path. Using an array of rwlocks (one per
> slot for example is not an option in this regard)
>
> Note: Multicasts and broadcasts still will need to take a lock,
> because doing a full lockless lookup in this case is difficult.
>
> 2) No expensive operations in the socket bind/unhash phases :
> - No expensive synchronize_rcu() calls.
>
> - No added rcu_head in socket structure, increasing memory needs,
> but more important, forcing us to use call_rcu() calls,
> that have the bad property of making sockets structure cold.
> (rcu grace period between socket freeing and its potential reuse
> make this socket being cold in CPU cache).
> David did a previous patch using call_rcu() and noticed a 20%
> impact on TCP connection rates.
> Quoting Cristopher Lameter :
> "Right. That results in cacheline cooldown. You'd want to recycle
> the object as they are cache hot on a per cpu basis. That is screwed
> up by the delayed regular rcu processing. We have seen multiple
> regressions due to cacheline cooldown.
> The only choice in cacheline hot sensitive areas is to deal with the
> complexity that comes with SLAB_DESTROY_BY_RCU or give up on RCU."
>
> - Because udp sockets are allocated from dedicated kmem_cache,
> use of SLAB_DESTROY_BY_RCU can help here.
>
> Theory of operation :
> ---------------------
>
> As the lookup is lockfree (using rcu_read_lock()/rcu_read_unlock()),
> special attention must be taken by readers and writers.
>
> Use of SLAB_DESTROY_BY_RCU is tricky too, because a socket can be freed,
> reused, inserted in a different chain or in worst case in the same chain
> while readers could do lookups in the same time.
>
> In order to avoid loops, a reader must check each socket found in a chain
> really belongs to the chain the reader was traversing. If it finds a
> mismatch, lookup must start again at the begining. This *restart* loop
> is the reason we had to use rdlock for the multicast case, because
> we dont want to send same message several times to the same socket.
>
> We use RCU only for fast path. Thus, /proc/net/udp still take rdlocks.
>
> Signed-off-by: Eric Dumazet <dada1@...mosbay.com>
> ---
> include/net/sock.h | 37 ++++++++++++++++++++++++++++++++++++-
> net/core/sock.c | 3 ++-
> net/ipv4/udp.c | 35 ++++++++++++++++++++++++++---------
> net/ipv4/udplite.c | 1 +
> net/ipv6/udp.c | 25 ++++++++++++++++++-------
> net/ipv6/udplite.c | 1 +
> 6 files changed, 84 insertions(+), 18 deletions(-)
>
On ipv6 side, I forgot to add a check before compute_score(), like I did on ipv4
+ rcu_read_lock();
+begin:
+ result = NULL;
+ badness = -1;
+ sk_for_each_rcu(sk, node, &hslot->head) {
< BEGIN HERE missing part --->
/*
* lockless reader, and SLAB_DESTROY_BY_RCU items:
* We must check this item was not moved to another chain
*/
if (udp_hashfn(net, sk->sk_hash) != hash)
goto begin;
< END missing part --->
score = compute_score(sk, net, hnum, saddr, sport, daddr, dport, dif);
if (score > badness) {
result = sk;
badness = score;
}
}
- if (result)
- sock_hold(result);
- read_unlock(&hslot->lock);
+ if (result) {
+ if (unlikely(!atomic_inc_not_zero(&result->sk_refcnt)))
+ result = NULL;
+ else if (unlikely(compute_score(result, net, hnum, saddr, sport,
+ daddr, dport, dif) < badness)) {
+ sock_put(result);
+ goto begin;
+ }
+ }
I will submit a new patch serie tomorrow, with :
Patch 1 : spinlocks instead of rwlocks, and bug spotted by Christian Bell
Patch 2 : splited on two parts (2 & 3) , one for IPV4, one for IPV6,
Thanks
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists