[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4B163226.50801@gmail.com>
Date: Wed, 02 Dec 2009 10:23:50 +0100
From: Eric Dumazet <eric.dumazet@...il.com>
To: David Miller <davem@...emloft.net>
CC: kdakhane@...il.com, netdev@...r.kernel.org,
netfilter@...r.kernel.org, zbr@...emap.net
Subject: Re: [PATCH] tcp: Fix a connect() race with timewait sockets
David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@...il.com>
> Date: Tue, 01 Dec 2009 16:00:39 +0100
>
>> [PATCH] tcp: Fix a connect() race with timewait sockets
>
> This condition would only trigger if the timewait recycling sysctl is
> enabled.
>
> It is off by default, and I can't find any mention in this bug report
> that it has been turned on.
Very true. I know nothing about context of the reporter, he didnt
answered to my queries.
Yes, if sysctl_tw_reuse is set, bug can triggers without any extra conditions.
But even if sysctl_tw_reuse is cleared, we might trigger the bug if
local port is bound to a value.
[User application called bind( port=XXX) before connect() ]
__inet_hash_connect() can indeed call check_established(... twp = NULL)
...
head = &hinfo->bhash[inet_bhashfn(net, snum, hinfo->bhash_size)];
tb = inet_csk(sk)->icsk_bind_hash;
spin_lock_bh(&head->lock);
if (sk_head(&tb->owners) == sk && !sk->sk_bind_node.next) {
hash(sk);
spin_unlock_bh(&head->lock);
return 0;
} else {
spin_unlock(&head->lock);
/* No definite answer... Walk to established hash table */
ret = check_established(death_row, sk, snum, NULL); <<< HERE >>>
out:
local_bh_enable();
return ret;
}
In this case, we call tcp_twsk_unique() with twp = NULL,
this bypass the sysctl_tcp_tw_reuse test.
int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp)
{
const struct tcp_timewait_sock *tcptw = tcp_twsk(sktw);
struct tcp_sock *tp = tcp_sk(sk);
/* With PAWS, it is safe from the viewpoint
of data integrity. Even without PAWS it is safe provided sequence
spaces do not overlap i.e. at data rates <= 80Mbit/sec.
Actually, the idea is close to VJ's one, only timestamp cache is
held not per host, but per port pair and TW bucket is used as state
holder.
If TW bucket has been already destroyed we fall back to VJ's scheme
and use initial timestamp retrieved from peer table.
*/
if (tcptw->tw_ts_recent_stamp &&
<<HERE>> (twp == NULL || (sysctl_tcp_tw_reuse &&
get_seconds() - tcptw->tw_ts_recent_stamp > 1))) {
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists