lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 21 Apr 2010 07:46:39 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Evgeniy Polyakov <zbr@...emap.net>
Cc:	Ben Greear <greearb@...delatech.com>,
	David Miller <davem@...emloft.net>,
	Gaspar Chilingarov <gasparch@...il.com>,
	netdev <netdev@...r.kernel.org>
Subject: Re: PROBLEM: Linux kernel 2.6.31 IPv4 TCP fails to open huge
 amount of outgoing connections (unable to bind ... )

Le mercredi 21 avril 2010 à 04:30 +0400, Evgeniy Polyakov a écrit :
> On Wed, Apr 21, 2010 at 02:05:14AM +0200, Eric Dumazet (eric.dumazet@...il.com) wrote:
> > I believe the bsockets 'optimization' is a bug, we should remove it.
> > 
> > This is a stable candidate (2.6.30+)
> > 
> > [PATCH net-next-2.6] tcp: remove bsockets count
> > 
> > Counting number of bound sockets to avoid a loop is buggy, since we cant
> > know how many IP addresses are in use. When threshold is reached, we try
> > 5 random slots and can fail while there are plenty available ports.
> 
> To return back to exponential bind() times you need to revert the whole
> original patch including magic 5 number, not only bsockets.
> 
> But actual problem is not in this digit, but in a deeper logic.
> Previously we scanned the whole table, now we have 5 attempts to
> find out at least one bucket (without conflict) we will insert
> new socket into. Apparently for large number of addresses it is possible
> that all 5 times we will randomly select those buckets which conflicts.
> As dumb solution we can increase 'attempt' number to infinite one, or
> fallback to whole-table-search after several random attempts, which is a
> bit more clever I think.
> 

Hmm, maybe I am blind, but on the case the threshold is reached, we dont
have 5 attempts "to find out at least one bucket (without conflict)"

We just take the first entry from the random starting point, _without_
checking we have a conflict.

if (net_eq(ib_net(tb), net) && tb->port == rover) {
	if (tb->fastreuse > 0 &&
	    sk->sk_reuse &&
	    sk->sk_state != TCP_LISTEN &&
	    (tb->num_owners < smallest_size || smallest_size == -1)) {
		smallest_size = tb->num_owners;
		smallest_rover = rover;
		if (atomic_read(&hashinfo->bsockets) > (high-low)+1) {
			spin_unlock(&head->lock);
			snum = smallest_rover;   // We select this, without checking for
conflicts.
			goto have_snum;
		}
	}


Then we goto to "have_snum" label

Then we realize (selected_IP, randomport) is already in use.
End of first try.

We redo the thing 5 times, so we only look at 5 slots out of
32000-64000.

Maybe the fix would need to check if there is a conflict before doing
the "goto have_snum"

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index e0a3e35..0498daf 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -120,9 +120,11 @@ again:
 						smallest_size = tb->num_owners;
 						smallest_rover = rover;
 						if (atomic_read(&hashinfo->bsockets) > (high - low) + 1) {
-							spin_unlock(&head->lock);
-							snum = smallest_rover;
-							goto have_snum;
+							if (!inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb))
+								spin_unlock(&head->lock);
+								snum = smallest_rover;
+								goto have_snum;
+							}
 						}
 					}
 					goto next;


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ