[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4979C9B6.3080302@cosmosbay.com>
Date: Fri, 23 Jan 2009 14:44:22 +0100
From: Eric Dumazet <dada1@...mosbay.com>
To: Vitaly Mayatskikh <v.mayatskih@...il.com>
CC: David Miller <davem@...emloft.net>, netdev@...r.kernel.org
Subject: Re: speed regression in udp_lib_lport_inuse()
Eric Dumazet a écrit :
> Vitaly Mayatskikh a écrit :
>> At Fri, 23 Jan 2009 01:14:58 +0100, Eric Dumazet wrote:
>>
>>> Could you try following patch ?
>>>
>>> Thank you
>>>
>>> [PATCH] udp: optimize bind(0) if many ports are in use
>>>
>>> commit 9088c5609584684149f3fb5b065aa7f18dcb03ff
>>> (udp: Improve port randomization) introduced a regression for UDP bind() syscall
>>> to null port (getting a random port) in case lot of ports are already in use.
>>>
>>> This is because we do about 28000 scans of very long chains (220 sockets per chain),
>>> with many spin_lock_bh()/spin_unlock_bh() calls.
>>>
>>> Fix this using a bitmap (64 bytes for current value of UDP_HTABLE_SIZE)
>>> so that we scan chains at most once.
>>>
>>> Instead of 250 ms per bind() call, we get after patch a time of 2.9 ms
>> It's much better, thanks. FPS in glxgears now drops down only 2x
>> harder if compare with 2.6.28. However, this again kills randomness :)
>> Now number distribution is k*x^2 with x-axis zero in the (high - low)
>
> Interesting... Please note I dont search in the bitmap from its begining,
> but from a random point.
>
> Maybe we should study lib/random32.c and discover it has said distribution :)
>
> Since my algo uses net_random() (random32() to get 32 bits number, that we
> split in two "16 bits numbers" (bias & rand).
>
> One (bias) to select the starting chain and starting slot in chain (so it really should be random)
> first = bias + 0 (slotn=0)
>
> One (rand, forced to be odd) to select next slot in chain in case current slot is already in use.
> I feel this is the problem, because when we hit a slot outside of ip_local_port_range, it seems we
> escape from this range with the distribution you got. maybe we should get rand depending on ip_local_port_range
>
>
Here is an updated of patch, that still have an uniform random distribution,
keeping a starting point in the range [low - high].
My attempt to avoid a divide failed ;)
Thank you
[PATCH] udp: optimize bind(0) if many ports are in use
commit 9088c5609584684149f3fb5b065aa7f18dcb03ff
(udp: Improve port randomization) introduced a regression for UDP bind() syscall
to null port (getting a random port) in case lot of ports are already in use.
This is because we do about 28000 scans of very long chains (220 sockets per chain),
with many spin_lock_bh()/spin_unlock_bh() calls.
Fix this using a bitmap (64 bytes for current value of UDP_HTABLE_SIZE)
so that we scan chains at most once.
Instead of 250 ms per bind() call, we get after patch a time of 2.9 ms
Based on a report from Vitaly Mayatskikh
Reported-by: Vitaly Mayatskikh <v.mayatskih@...il.com>
Signed-off-by: Eric Dumazet <dada1@...mosbay.com>
---
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index cf5ab05..6e27868 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -120,8 +120,11 @@ EXPORT_SYMBOL(sysctl_udp_wmem_min);
atomic_t udp_memory_allocated;
EXPORT_SYMBOL(udp_memory_allocated);
+#define PORTS_PER_CHAIN (65536 / UDP_HTABLE_SIZE)
+
static int udp_lib_lport_inuse(struct net *net, __u16 num,
const struct udp_hslot *hslot,
+ unsigned long *bitmap,
struct sock *sk,
int (*saddr_comp)(const struct sock *sk1,
const struct sock *sk2))
@@ -132,12 +135,16 @@ static int udp_lib_lport_inuse(struct net *net, __u16 num,
sk_nulls_for_each(sk2, node, &hslot->head)
if (net_eq(sock_net(sk2), net) &&
sk2 != sk &&
- sk2->sk_hash == num &&
+ (bitmap || sk2->sk_hash == num) &&
(!sk2->sk_reuse || !sk->sk_reuse) &&
(!sk2->sk_bound_dev_if || !sk->sk_bound_dev_if
|| sk2->sk_bound_dev_if == sk->sk_bound_dev_if) &&
- (*saddr_comp)(sk, sk2))
- return 1;
+ (*saddr_comp)(sk, sk2)) {
+ if (bitmap)
+ __set_bit(sk2->sk_hash / UDP_HTABLE_SIZE, bitmap);
+ else
+ return 1;
+ }
return 0;
}
@@ -160,32 +167,42 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
if (!snum) {
int low, high, remaining;
unsigned rand;
- unsigned short first;
+ unsigned short first, last;
+ DECLARE_BITMAP(bitmap, PORTS_PER_CHAIN);
inet_get_local_port_range(&low, &high);
remaining = (high - low) + 1;
rand = net_random();
- snum = first = rand % remaining + low;
- rand |= 1;
- for (;;) {
- hslot = &udptable->hash[udp_hashfn(net, snum)];
+ first = rand % remaining + low;
+ rand = (rand | 1) * UDP_HTABLE_SIZE;
+ for (last = first + UDP_HTABLE_SIZE; first != last; first++) {
+ hslot = &udptable->hash[udp_hashfn(net, first)];
+ bitmap_zero(bitmap, PORTS_PER_CHAIN);
spin_lock_bh(&hslot->lock);
- if (!udp_lib_lport_inuse(net, snum, hslot, sk, saddr_comp))
- break;
- spin_unlock_bh(&hslot->lock);
+ udp_lib_lport_inuse(net, snum, hslot, bitmap, sk, saddr_comp);
+
+ snum = first;
+ /*
+ * PORTS_PER_CHAIN loops, because snum is unsigned short
+ * and we add an odd multiple of UDP_HTABLE_SIZE
+ */
do {
- snum = snum + rand;
- } while (snum < low || snum > high);
- if (snum == first)
- goto fail;
+ if (low <= snum && snum <= high &&
+ !test_bit(snum / UDP_HTABLE_SIZE, bitmap))
+ goto found;
+ snum += rand;
+ } while (snum != first);
+ spin_unlock_bh(&hslot->lock);
}
+ goto fail;
} else {
hslot = &udptable->hash[udp_hashfn(net, snum)];
spin_lock_bh(&hslot->lock);
- if (udp_lib_lport_inuse(net, snum, hslot, sk, saddr_comp))
+ if (udp_lib_lport_inuse(net, snum, hslot, NULL, sk, saddr_comp))
goto fail_unlock;
}
+found:
inet_sk(sk)->num = snum;
sk->sk_hash = snum;
if (sk_unhashed(sk)) {
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists