netdev - Re: speed regression in udp_lib_lport

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49790C02.90800@cosmosbay.com>
Date:	Fri, 23 Jan 2009 01:14:58 +0100
From:	Eric Dumazet <dada1@...mosbay.com>
To:	Vitaly Mayatskikh <v.mayatskih@...il.com>
CC:	David Miller <davem@...emloft.net>, netdev@...r.kernel.org
Subject: Re: speed regression in udp_lib_lport_inuse()

Vitaly Mayatskikh a écrit :
> At Thu, 22 Jan 2009 23:06:59 +0100, Eric Dumazet wrote:
> 
>>> 		err = bind(s, (const struct sockaddr*)&sa, sizeof(sa));
>> Bug here, if bind() returns -1 (all ports are in use)
> 
> Yeah, there was assert(), but the program drops to problems very soon,
> I was lazy to handle this situation correctly and just removed it ;)
> 
>>> Thanks!
>> Hello Vitaly, thanks for this excellent report.
>>
>> Yes, current code is really not good when all ports are in use :
>>
>> We now have to scan 28232 [1] times long chains of 220 sockets.
>> Thats very long (but at least thread is preemptable)
>>
>> In the past (before patches), only one thread was allowed to run in kernel while scanning
>> udp port table (we had only one global lock udp_hash_lock protecting the whole udp table).
> 
> Very true, my (older) kernel with udp_hash_lock just become totally
> unresponsive after running this test. .29-rc2 become jerky only, but
> still works.
> 
>> This thread was faster because it was not slowed down by other threads.
>> (But the rwlock we used was responsible for starvations of writers if many UDP frames
>> were received)
>>
>>
>>
>> One way to solve the problem could be to use following :
>>
>> 1) Raising UDP_HTABLE_SIZE from 128 to 1024 to reduce average chain lengths.
>>
>> 2) In bind(0) algo, use rcu locking to find a possible usable port. All cpus can run in //, without
>> dirtying locks. Then lock the found chain and recheck port is available before using it.
> 
> I think 2 is definitely better than 1, because 1 is not actually
> fixing anything, but postpones the problem slightly.
> 
>> [1] replace 28232 by your actual /proc/sys/net/ipv4/ip_local_port_range values
>> 61000 - 32768 = 28232
>>
>> I will try to code a patch before this week end.
> 
> Cool!
> 
>> Thanks
>>
>> Note : I tried to use a mutex to force only one thread in bind(0) code but got no real speedup.
>> But it should help if you have a SMP machine, since only one cpu will be busy in bind(0)
>>
> 
> You saved my time, I was thinking about trying mutexes also. Thanks :)
> 

Could you try following patch ?

Thank you

[PATCH] udp: optimize bind(0) if many ports are in use

commit 9088c5609584684149f3fb5b065aa7f18dcb03ff
(udp: Improve port randomization) introduced a regression for UDP bind() syscall
to null port (getting a random port) in case lot of ports are already in use.

This is because we do about 28000 scans of very long chains (220 sockets per chain),
with many spin_lock_bh()/spin_unlock_bh() calls.

Fix this using a bitmap (64 bytes for current value of UDP_HTABLE_SIZE)
so that we scan chains at most once.

Instead of 250 ms per bind() call, we get after patch a time of 2.9 ms 

Based on a report from Vitaly Mayatskikh

Reported-by: Vitaly Mayatskikh <v.mayatskih@...il.com>
Signed-off-by: Eric Dumazet <dada1@...mosbay.com>
---
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index cf5ab05..adbdbd8 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -120,8 +120,11 @@ EXPORT_SYMBOL(sysctl_udp_wmem_min);
 atomic_t udp_memory_allocated;
 EXPORT_SYMBOL(udp_memory_allocated);
 
+#define PORTS_PER_CHAIN (65536 / UDP_HTABLE_SIZE)
+
 static int udp_lib_lport_inuse(struct net *net, __u16 num,
 			       const struct udp_hslot *hslot,
+			       unsigned long *bitmap,
 			       struct sock *sk,
 			       int (*saddr_comp)(const struct sock *sk1,
 						 const struct sock *sk2))
@@ -132,12 +135,16 @@ static int udp_lib_lport_inuse(struct net *net, __u16 num,
 	sk_nulls_for_each(sk2, node, &hslot->head)
 		if (net_eq(sock_net(sk2), net)			&&
 		    sk2 != sk					&&
-		    sk2->sk_hash == num				&&
+		    (bitmap || sk2->sk_hash == num)		&&
 		    (!sk2->sk_reuse || !sk->sk_reuse)		&&
 		    (!sk2->sk_bound_dev_if || !sk->sk_bound_dev_if
 			|| sk2->sk_bound_dev_if == sk->sk_bound_dev_if) &&
-		    (*saddr_comp)(sk, sk2))
-			return 1;
+		    (*saddr_comp)(sk, sk2)) {
+			if (bitmap)
+				__set_bit(sk2->sk_hash / UDP_HTABLE_SIZE, bitmap);
+			else
+				return 1;
+		}
 	return 0;
 }
 
@@ -158,34 +165,44 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
 	struct net *net = sock_net(sk);
 
 	if (!snum) {
-		int low, high, remaining;
-		unsigned rand;
+		int low, high;
+		unsigned rand, slotn, bias;
 		unsigned short first;
+		DECLARE_BITMAP(bitmap, PORTS_PER_CHAIN);
 
 		inet_get_local_port_range(&low, &high);
-		remaining = (high - low) + 1;
 
 		rand = net_random();
-		snum = first = rand % remaining + low;
-		rand |= 1;
-		for (;;) {
-			hslot = &udptable->hash[udp_hashfn(net, snum)];
+		bias = rand;
+		rand = ((rand >> 16) | 1) * UDP_HTABLE_SIZE;
+		for (slotn = 0; slotn < UDP_HTABLE_SIZE; slotn++) {
+			first = slotn + bias;
+			hslot = &udptable->hash[udp_hashfn(net, first)];
+			bitmap_zero(bitmap, PORTS_PER_CHAIN);
 			spin_lock_bh(&hslot->lock);
-			if (!udp_lib_lport_inuse(net, snum, hslot, sk, saddr_comp))
-				break;
-			spin_unlock_bh(&hslot->lock);
+			udp_lib_lport_inuse(net, snum, hslot, bitmap, sk, saddr_comp);
+
+			snum = first;
+			/*
+			 * PORTS_PER_CHAIN loops, because snum is unsigned short
+			 * and we add an odd multiple of UDP_HTABLE_SIZE
+			 */
 			do {
-				snum = snum + rand;
-			} while (snum < low || snum > high);
-			if (snum == first)
-				goto fail;
+				if (low <= snum && snum <= high &&
+				    !test_bit(snum / UDP_HTABLE_SIZE, bitmap))
+					goto found;
+				snum += rand;
+			} while (snum != first);
+			spin_unlock_bh(&hslot->lock);
 		}
+		goto fail;
 	} else {
 		hslot = &udptable->hash[udp_hashfn(net, snum)];
 		spin_lock_bh(&hslot->lock);
-		if (udp_lib_lport_inuse(net, snum, hslot, sk, saddr_comp))
+		if (udp_lib_lport_inuse(net, snum, hslot, NULL, sk, saddr_comp))
 			goto fail_unlock;
 	}
+found:
 	inet_sk(sk)->num = snum;
 	sk->sk_hash = snum;
 	if (sk_unhashed(sk)) {

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html