lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 6 Apr 2022 21:04:50 +0200 (CEST)
From:   Jozsef Kadlecsik <kadlec@...filter.org>
To:     Florian Westphal <fw@...len.de>
cc:     Neal Cardwell <ncardwell@...gle.com>,
        Eric Dumazet <edumazet@...gle.com>,
        Jaco Kroon <jaco@....co.za>, netfilter-devel@...r.kernel.org,
        netdev@...r.kernel.org
Subject: Re: linux 5.17.1 disregarding ACK values resulting in stalled TCP
 connections

Hi Florian,

On Wed, 6 Apr 2022, Florian Westphal wrote:

> Neal Cardwell <ncardwell@...gle.com> wrote:
> 
> [ trimmed CCs, add Jozsef and nf-devel ]
> 
> Neal, Eric, thanks for debugging this problem.
> 
> > On Sat, Apr 2, 2022 at 12:32 PM Eric Dumazet <edumazet@...gle.com> wrote:
> > > On Sat, Apr 2, 2022 at 9:29 AM Neal Cardwell <ncardwell@...gle.com> wrote:
> > > > FWIW those log entries indicate netfilter on the mail client machine
> > > > dropping consecutive outbound skbs with 2*MSS of payload. So that
> > > > explains the large consecutive losses of client data packets to the
> > > > e-mail server. That seems to confirm my earlier hunch that those drops
> > > > of consecutive client data packets "do not look like normal congestive
> > > > packet loss".
> > >
> > > This also explains why we have all these tiny 2-MSS packets in the pcap.
> > > Under normal conditions, autocorking should kick in, allowing TCP to
> > > build bigger TSO packets.
> > 
> > I have not looked at the conntrack code before today, but AFAICT this
> > is the buggy section of  nf_conntrack_proto_tcp.c:
> > 
> >         } else if (((state->state == TCP_CONNTRACK_SYN_SENT
> >                      && dir == IP_CT_DIR_ORIGINAL)
> >                    || (state->state == TCP_CONNTRACK_SYN_RECV
> >                      && dir == IP_CT_DIR_REPLY))
> >                    && after(end, sender->td_end)) {
> >                 /*
> >                  * RFC 793: "if a TCP is reinitialized ... then it need
> >                  * not wait at all; it must only be sure to use sequence
> >                  * numbers larger than those recently used."
> >                  */
> >                 sender->td_end =
> >                 sender->td_maxend = end;
> >                 sender->td_maxwin = (win == 0 ? 1 : win);
> > 
> >                 tcp_options(skb, dataoff, tcph, sender);
> > 
> > Note that the tcp_options() function implicitly assumes it is being
> > called on a SYN, because it sets state->td_scale to 0 and only sets
> > state->td_scale to something non-zero if it sees a wscale option. So
> > if we ever call that on an skb that's not a SYN, we will forget that
> > the connection is using the wscale option.
> >
> > But at this point in the code it is calling tcp_options() without
> > first checking that this is a SYN.
> 
> Yes, thats the bug, tcp_options() must not be called if syn bit is not
> set.
> 
> > For this TFO scenario like the one in the trace, where the server
> > sends its first data packet after the SYNACK packet and before the
> > client's first ACK, presumably the conntrack state machine is
> > (correctly) SYN_RECV, and then (incorrectly) executes this code,
> 
> Right.  Jozsef, for context, sequence is in trace is:
> 
> S > C Flags [S], seq 3451342529, win 62580, options [mss 8940,sackOK,TS val 331187616 ecr 0,nop,wscale 7,tfo [|tcp]>
> C > S Flags [S.], seq 2699962254, ack 3451342530, win 65535, options [mss 1440,sackOK,TS val 1206542770 ecr 331187616,nop,wscale 8], length 0
> C > S Flags [P.], seq 1:89, ack 1, win 256, options [nop,nop,TS val 1206542772 ecr 331187616], length 88: SMTP [|smtp]
> 
> Normally, 3rd packet would be S > C, but this one is C > S.
> 
> So, packet #3 hits the 'reinit' branch which zaps wscale option.
> 
> > Someone more familiar with conntrack may have a good idea about how to
> > best fix this?
> 
> Jozsef, does this look sane to you?
> It fixes the TFO capture and still passes the test case i made for
> 82b72cb94666b3dbd7152bb9f441b068af7a921b
> ("netfilter: conntrack: re-init state for retransmitted syn-ack").

As far as I see it'd break simultaneous open because after(end, 
sender->td_end) is called in the new condition:

> diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
> index 8ec55cd72572..90ad1c0f23b1 100644
> --- a/net/netfilter/nf_conntrack_proto_tcp.c
> +++ b/net/netfilter/nf_conntrack_proto_tcp.c
> @@ -556,33 +556,24 @@ static bool tcp_in_window(struct nf_conn *ct,
>  			}
>  
>  		}
> -	} else if (((state->state == TCP_CONNTRACK_SYN_SENT
> -		     && dir == IP_CT_DIR_ORIGINAL)
> -		   || (state->state == TCP_CONNTRACK_SYN_RECV
> -		     && dir == IP_CT_DIR_REPLY))
> -		   && after(end, sender->td_end)) {
> +	} else if (tcph->syn &&
> +		   after(end, sender->td_end) &&
> +		   (state->state == TCP_CONNTRACK_SYN_SENT ||
> +		    state->state == TCP_CONNTRACK_SYN_RECV)) {
>  		/*
>  		 * RFC 793: "if a TCP is reinitialized ... then it need
>  		 * not wait at all; it must only be sure to use sequence
>  		 * numbers larger than those recently used."
> -		 */
> -		sender->td_end =
> -		sender->td_maxend = end;
> -		sender->td_maxwin = (win == 0 ? 1 : win);
> -
> -		tcp_options(skb, dataoff, tcph, sender);
> -	} else if (tcph->syn && dir == IP_CT_DIR_REPLY &&
> -		   state->state == TCP_CONNTRACK_SYN_SENT) {
> -		/* Retransmitted syn-ack, or syn (simultaneous open).
>  		 *
> +		 * also check for retransmitted syn-ack, or syn (simultaneous open).
>  		 * Re-init state for this direction, just like for the first
>  		 * syn(-ack) reply, it might differ in seq, ack or tcp options.
> +		 *
> +		 * Check for invalid syn-ack in original direction was already done.
>  		 */
>  		tcp_init_sender(sender, receiver,
>  				skb, dataoff, tcph,
>  				end, win);
> -		if (!tcph->ack)
> -			return true;
>  	}
>  
>  	if (!(tcph->ack)) {
> 

I'd merge the two conditions so that it'd cover both original condition 
branches:

diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index 8ec55cd72572..87375ce2f995 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -556,33 +556,26 @@ static bool tcp_in_window(struct nf_conn *ct,
 			}
 
 		}
-	} else if (((state->state == TCP_CONNTRACK_SYN_SENT
-		     && dir == IP_CT_DIR_ORIGINAL)
-		   || (state->state == TCP_CONNTRACK_SYN_RECV
-		     && dir == IP_CT_DIR_REPLY))
-		   && after(end, sender->td_end)) {
+	} else if (tcph->syn &&
+		   ((after(end, sender->td_end) &&
+		     (state->state == TCP_CONNTRACK_SYN_SENT ||
+		      state->state == TCP_CONNTRACK_SYN_RECV)) ||
+		    (dir == IP_CT_DIR_REPLY &&
+		     state->state == TCP_CONNTRACK_SYN_SENT))) {
 		/*
 		 * RFC 793: "if a TCP is reinitialized ... then it need
 		 * not wait at all; it must only be sure to use sequence
 		 * numbers larger than those recently used."
-		 */
-		sender->td_end =
-		sender->td_maxend = end;
-		sender->td_maxwin = (win == 0 ? 1 : win);
-
-		tcp_options(skb, dataoff, tcph, sender);
-	} else if (tcph->syn && dir == IP_CT_DIR_REPLY &&
-		   state->state == TCP_CONNTRACK_SYN_SENT) {
-		/* Retransmitted syn-ack, or syn (simultaneous open).
 		 *
+		 * also check for retransmitted syn-ack, or syn (simultaneous open).
 		 * Re-init state for this direction, just like for the first
 		 * syn(-ack) reply, it might differ in seq, ack or tcp options.
+		 *
+		 * Check for invalid syn-ack in original direction was already done.
 		 */
 		tcp_init_sender(sender, receiver,
 				skb, dataoff, tcph,
 				end, win);
-		if (!tcph->ack)
-			return true;
 	}
 
 	if (!(tcph->ack)) {

What do you think?

Best regards,
Jozsef
-
E-mail  : kadlec@...ckhole.kfki.hu, kadlecsik.jozsef@...ner.hu
PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics
          H-1525 Budapest 114, POB. 49, Hungary

Powered by blists - more mailing lists