lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 28 Oct 2008 23:45:15 +0100
From:	Eric Dumazet <dada1@...mosbay.com>
To:	David Miller <davem@...emloft.net>
CC:	shemminger@...tta.com, benny+usenet@...rsen.dk, minyard@....org,
	netdev@...r.kernel.org, paulmck@...ux.vnet.ibm.com,
	Christoph Lameter <clameter@....com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Evgeniy Polyakov <johnpol@....mipt.ru>
Subject: Re: [PATCH 2/2] udp: RCU handling for Unicast packets.

Eric Dumazet a écrit :
> RCUification of UDP hash tables
> 
> Goals are :
> 
> 1) Optimizing handling of incoming Unicast UDP frames, so that no memory
>  writes should happen in the fast path. Using an array of rwlocks (one per
>  slot for example is not an option in this regard)
> 
>  Note: Multicasts and broadcasts still will need to take a lock,
>  because doing a full lockless lookup in this case is difficult.
> 
> 2) No expensive operations in the socket bind/unhash phases :
>   - No expensive synchronize_rcu() calls.
> 
>   - No added rcu_head in socket structure, increasing memory needs,
>   but more important, forcing us to use call_rcu() calls,
>   that have the bad property of making sockets structure cold.
>   (rcu grace period between socket freeing and its potential reuse
>    make this socket being cold in CPU cache).
>   David did a previous patch using call_rcu() and noticed a 20%
>   impact on TCP connection rates.
>   Quoting Cristopher Lameter :
>    "Right. That results in cacheline cooldown. You'd want to recycle
>     the object as they are cache hot on a per cpu basis. That is screwed
>     up by the delayed regular rcu processing. We have seen multiple
>     regressions due to cacheline cooldown.
>     The only choice in cacheline hot sensitive areas is to deal with the
>     complexity that comes with SLAB_DESTROY_BY_RCU or give up on RCU."
> 
>   - Because udp sockets are allocated from dedicated kmem_cache,
>   use of SLAB_DESTROY_BY_RCU can help here.
> 
> Theory of operation :
> ---------------------
> 
> As the lookup is lockfree (using rcu_read_lock()/rcu_read_unlock()),
> special attention must be taken by readers and writers.
> 
> Use of SLAB_DESTROY_BY_RCU is tricky too, because a socket can be freed,
> reused, inserted in a different chain or in worst case in the same chain
> while readers could do lookups in the same time.
> 
> In order to avoid loops, a reader must check each socket found in a chain
> really belongs to the chain the reader was traversing. If it finds a
> mismatch, lookup must start again at the begining. This *restart* loop
> is the reason we had to use rdlock for the multicast case, because
> we dont want to send same message several times to the same socket.
> 
> We use RCU only for fast path. Thus, /proc/net/udp still take rdlocks.
> 
> Signed-off-by: Eric Dumazet <dada1@...mosbay.com>
> ---
> include/net/sock.h |   37 ++++++++++++++++++++++++++++++++++++-
> net/core/sock.c    |    3 ++-
> net/ipv4/udp.c     |   35 ++++++++++++++++++++++++++---------
> net/ipv4/udplite.c |    1 +
> net/ipv6/udp.c     |   25 ++++++++++++++++++-------
> net/ipv6/udplite.c |    1 +
> 6 files changed, 84 insertions(+), 18 deletions(-)
> 

On ipv6 side, I forgot to add a check before compute_score(), like I did on ipv4


+	rcu_read_lock();
+begin:
+	result = NULL;
+	badness = -1;
+	sk_for_each_rcu(sk, node, &hslot->head) {

< BEGIN HERE missing part --->
		/*
		 * lockless reader, and SLAB_DESTROY_BY_RCU items:
		 * We must check this item was not moved to another chain
		 */
		if (udp_hashfn(net, sk->sk_hash) != hash)
			goto begin;

< END missing part --->
 		score = compute_score(sk, net, hnum, saddr, sport, daddr, dport, dif);
 		if (score > badness) {
 			result = sk;
 			badness = score;
 		}
 	}
-	if (result)
-		sock_hold(result);
-	read_unlock(&hslot->lock);
+	if (result) {
+		if (unlikely(!atomic_inc_not_zero(&result->sk_refcnt)))
+			result = NULL;
+		else if (unlikely(compute_score(result, net, hnum, saddr, sport,
+					daddr, dport, dif) < badness)) {
+			sock_put(result);
+			goto begin;
+ 		}
+	}


I will submit a new patch serie tomorrow, with :

Patch 1 : spinlocks instead of rwlocks, and bug spotted by Christian Bell

Patch 2 : splited on two parts (2 & 3) , one for IPV4, one for IPV6, 

Thanks

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ