[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1271828799.7895.1287.camel@edumazet-laptop>
Date: Wed, 21 Apr 2010 07:46:39 +0200
From: Eric Dumazet <eric.dumazet@...il.com>
To: Evgeniy Polyakov <zbr@...emap.net>
Cc: Ben Greear <greearb@...delatech.com>,
David Miller <davem@...emloft.net>,
Gaspar Chilingarov <gasparch@...il.com>,
netdev <netdev@...r.kernel.org>
Subject: Re: PROBLEM: Linux kernel 2.6.31 IPv4 TCP fails to open huge
amount of outgoing connections (unable to bind ... )
Le mercredi 21 avril 2010 à 04:30 +0400, Evgeniy Polyakov a écrit :
> On Wed, Apr 21, 2010 at 02:05:14AM +0200, Eric Dumazet (eric.dumazet@...il.com) wrote:
> > I believe the bsockets 'optimization' is a bug, we should remove it.
> >
> > This is a stable candidate (2.6.30+)
> >
> > [PATCH net-next-2.6] tcp: remove bsockets count
> >
> > Counting number of bound sockets to avoid a loop is buggy, since we cant
> > know how many IP addresses are in use. When threshold is reached, we try
> > 5 random slots and can fail while there are plenty available ports.
>
> To return back to exponential bind() times you need to revert the whole
> original patch including magic 5 number, not only bsockets.
>
> But actual problem is not in this digit, but in a deeper logic.
> Previously we scanned the whole table, now we have 5 attempts to
> find out at least one bucket (without conflict) we will insert
> new socket into. Apparently for large number of addresses it is possible
> that all 5 times we will randomly select those buckets which conflicts.
> As dumb solution we can increase 'attempt' number to infinite one, or
> fallback to whole-table-search after several random attempts, which is a
> bit more clever I think.
>
Hmm, maybe I am blind, but on the case the threshold is reached, we dont
have 5 attempts "to find out at least one bucket (without conflict)"
We just take the first entry from the random starting point, _without_
checking we have a conflict.
if (net_eq(ib_net(tb), net) && tb->port == rover) {
if (tb->fastreuse > 0 &&
sk->sk_reuse &&
sk->sk_state != TCP_LISTEN &&
(tb->num_owners < smallest_size || smallest_size == -1)) {
smallest_size = tb->num_owners;
smallest_rover = rover;
if (atomic_read(&hashinfo->bsockets) > (high-low)+1) {
spin_unlock(&head->lock);
snum = smallest_rover; // We select this, without checking for
conflicts.
goto have_snum;
}
}
Then we goto to "have_snum" label
Then we realize (selected_IP, randomport) is already in use.
End of first try.
We redo the thing 5 times, so we only look at 5 slots out of
32000-64000.
Maybe the fix would need to check if there is a conflict before doing
the "goto have_snum"
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index e0a3e35..0498daf 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -120,9 +120,11 @@ again:
smallest_size = tb->num_owners;
smallest_rover = rover;
if (atomic_read(&hashinfo->bsockets) > (high - low) + 1) {
- spin_unlock(&head->lock);
- snum = smallest_rover;
- goto have_snum;
+ if (!inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb))
+ spin_unlock(&head->lock);
+ snum = smallest_rover;
+ goto have_snum;
+ }
}
}
goto next;
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists