[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.11.1804181007430.4316@blackhole.kfki.hu>
Date: Wed, 18 Apr 2018 10:13:02 +0200 (CEST)
From: Jozsef Kadlecsik <kadlec@...ckhole.kfki.hu>
To: Florian Westphal <fw@...len.de>
cc: Michal Kubecek <mkubecek@...e.cz>, netdev@...r.kernel.org,
Marcelo Ricardo Leitner <marcelo.leitner@...il.com>,
Eric Dumazet <eric.dumazet@...il.com>
Subject: Re: tcp hang when socket fills up ?
Hi,
On Tue, 17 Apr 2018, Florian Westphal wrote:
> Dominique Martinet <asmadeus@...ewreck.org> wrote:
>
> [ CC Jozsef ]
>
> > Could it have something to do with the way I setup the connection?
> > I don't think the "both remotes call connect() with carefully selected
> > source/dest port" is a very common case..
> >
> > If you look at the tcpdump outputs I attached the sequence usually is
> > something like
> > server > client SYN
> > client > server SYN
> > server > client SYNACK
> > client > server ACK
> >
> > ultimately it IS a connection, but with an extra SYN packet in front of
> > it (that first SYN opens up the conntrack of the nat so that the
> > client's syn can come in, the client's conntrack will be that of a
> > normal connection since its first SYN goes in directly after the
> > server's (it didn't see the server's SYN))
> >
> > Looking at my logs again, I'm seeing the same as you:
> >
> > This looks like the actual SYN/SYN/SYNACK/ACK:
> > - 14.364090 seq=505004283 likely SYN coming out of server
> > - 14.661731 seq=1913287797 on next line it says receiver
> > end=505004284 so likely the matching SYN from client
> > Which this time gets a proper SYNACK from server:
> > 14.662020 seq=505004283 ack=1913287798
> > And following final dataless ACK:
> > 14.687570 seq=1913287798 ack=505004284
> >
> > Then as you point out some data ACK, where the scale poofs:
> > 14.688762 seq=1913287798 ack=505004284+(0) sack=505004284+(0) win=229 end=1913287819
> > 14.688793 tcp_in_window: sender end=1913287798 maxend=1913316998 maxwin=29312 scale=7 receiver end=505004284 maxend=505033596 maxwin=29200 scale=7
> > 14.688824 tcp_in_window:
> > 14.688852 seq=1913287798 ack=505004284+(0) sack=505004284+(0) win=229 end=1913287819
> > 14.688882 tcp_in_window: sender end=1913287819 maxend=1913287819 maxwin=229 scale=0 receiver end=505004284 maxend=505033596 maxwin=29200 scale=7
> >
> > As you say, only tcp_options() will clear only on side of the scales.
> > We don't have sender->td_maxwin == 0 (printed) so I see no other way
> > than we are in the last else if:
> > - we have after(end, sender->td_end) (end=1913287819 > sender
> > end=1913287798)
> > - I assume the tcp state machine must be confused because of the
> > SYN/SYN/SYNACK/ACK pattern and we probably enter the next check,
> > but since this is a data packet it doesn't have the tcp option for scale
> > thus scale resets.
>
> Yes, this looks correct. Jozsef, can you please have a look?
>
> Problem seems to be that conntrack believes that ACK packet
> re-initializes the connection:
>
> 595 /*
> 596 * RFC 793: "if a TCP is reinitialized ... then it need
> 597 * not wait at all; it must only be sure to use sequence
> 598 * numbers larger than those recently used."
> 599 */
> 600 sender->td_end =
> 601 sender->td_maxend = end;
> 602 sender->td_maxwin = (win == 0 ? 1 : win);
> 603
> 604 tcp_options(skb, dataoff, tcph, sender);
>
> and last line clears the scale value (no wscale option in data packet).
>
>
> Transitions are:
> server > client SYN sNO -> sSS
> client > server SYN sSS -> sS2
> server > client SYNACK sS2 -> sSR /* here */
> client > server ACK sSR -> sES
>
> SYN/ACK was observed in original direction so we hit
> state->state == TCP_CONNTRACK_SYN_RECV && dir == IP_CT_DIR_REPLY test
> when we see the ack packet and end up in the 'TCP is reinitialized' branch.
>
> AFAICS, without this, connection would move to sES just fine,
> as the data ack is in window.
Yes, the state transition is wrong for simultaneous open, because the
tcp_conntracks table is not (cannot be) smart enough. Could you verify the
next untested patch?
diff --git a/include/uapi/linux/netfilter/nf_conntrack_tcp.h b/include/uapi/linux/netfilter/nf_conntrack_tcp.h
index 74b9115..bcba72d 100644
--- a/include/uapi/linux/netfilter/nf_conntrack_tcp.h
+++ b/include/uapi/linux/netfilter/nf_conntrack_tcp.h
@@ -46,6 +46,9 @@ enum tcp_conntrack {
/* Marks possibility for expected RFC5961 challenge ACK */
#define IP_CT_EXP_CHALLENGE_ACK 0x40
+/* Simultaneous open initialized */
+#define IP_CT_TCP_SIMULTANEOUS_OPEN 0x80
+
struct nf_ct_tcp_flags {
__u8 flags;
__u8 mask;
diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index e97cdc1..8e67910 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -981,6 +981,17 @@ static int tcp_packet(struct nf_conn *ct,
return NF_ACCEPT; /* Don't change state */
}
break;
+ case TCP_CONNTRACK_SYN_SENT2:
+ /* tcp_conntracks table is not smart enough to handle
+ * simultaneous open.
+ */
+ ct->proto.tcp.last_flags |= IP_CT_TCP_SIMULTANEOUS_OPEN;
+ break;
+ case TCP_CONNTRACK_SYN_RECV:
+ if (dir == IP_CT_DIR_REPLY && index == TCP_ACK_SET &&
+ ct->proto.tcp.last_flags & IP_CT_TCP_SIMULTANEOUS_OPEN)
+ new_state = TCP_CONNTRACK_ESTABLISHED;
+ break;
case TCP_CONNTRACK_CLOSE:
if (index == TCP_RST_SET
&& (ct->proto.tcp.seen[!dir].flags & IP_CT_TCP_FLAG_MAXACK_SET)
Best regards,
Jozsef
-
E-mail : kadlec@...ckhole.kfki.hu, kadlecsik.jozsef@...ner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
H-1525 Budapest 114, POB. 49, Hungary
Powered by blists - more mailing lists