lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Sat, 2 Apr 2022 14:04:08 -0400 From: Neal Cardwell <ncardwell@...gle.com> To: Eric Dumazet <edumazet@...gle.com> Cc: Jaco Kroon <jaco@....co.za>, Florian Westphal <fw@...len.de>, LKML <linux-kernel@...r.kernel.org>, Netdev <netdev@...r.kernel.org>, Yuchung Cheng <ycheng@...gle.com>, Wei Wang <weiwan@...gle.com> Subject: Re: linux 5.17.1 disregarding ACK values resulting in stalled TCP connections On Sat, Apr 2, 2022 at 12:32 PM Eric Dumazet <edumazet@...gle.com> wrote: > > On Sat, Apr 2, 2022 at 9:29 AM Neal Cardwell <ncardwell@...gle.com> wrote: > > > > FWIW those log entries indicate netfilter on the mail client machine > > dropping consecutive outbound skbs with 2*MSS of payload. So that > > explains the large consecutive losses of client data packets to the > > e-mail server. That seems to confirm my earlier hunch that those drops > > of consecutive client data packets "do not look like normal congestive > > packet loss". > > > This also explains why we have all these tiny 2-MSS packets in the pcap. > > Under normal conditions, autocorking should kick in, allowing TCP to > build bigger TSO packets. I have not looked at the conntrack code before today, but AFAICT this is the buggy section of nf_conntrack_proto_tcp.c: } else if (((state->state == TCP_CONNTRACK_SYN_SENT && dir == IP_CT_DIR_ORIGINAL) || (state->state == TCP_CONNTRACK_SYN_RECV && dir == IP_CT_DIR_REPLY)) && after(end, sender->td_end)) { /* * RFC 793: "if a TCP is reinitialized ... then it need * not wait at all; it must only be sure to use sequence * numbers larger than those recently used." */ sender->td_end = sender->td_maxend = end; sender->td_maxwin = (win == 0 ? 1 : win); tcp_options(skb, dataoff, tcph, sender); Note that the tcp_options() function implicitly assumes it is being called on a SYN, because it sets state->td_scale to 0 and only sets state->td_scale to something non-zero if it sees a wscale option. So if we ever call that on an skb that's not a SYN, we will forget that the connection is using the wscale option. But at this point in the code it is calling tcp_options() without first checking that this is a SYN. For this TFO scenario like the one in the trace, where the server sends its first data packet after the SYNACK packet and before the client's first ACK, presumably the conntrack state machine is (correctly) SYN_RECV, and then (incorrectly) executes this code, including the call to tcp_options(), on this first data packet, which has no SYN bit, and no wscale option. Thus tcp_options() zeroes out the server's sending state td_scale and does not set it to a non-zero value. So now conntrack thinks the server is not using the wscale option. So when conntrack interprets future receive windows from the server, it does not scale them (with: win <<= sender->td_scale;), so in this scenario the estimated right edge of the server's receive window (td_maxend) is never advanced past the roughly 64KB value offered in the SYN. Thus when the client sends data packets beyond 64KBytes, conntrack declares them invalid and drops them, due to failing the condition Eric noted above: before(seq, sender->td_maxend + 1), This explains my previous observation that the client's original data packet transmissions are always dropped after the first 64KBytes. Someone more familiar with conntrack may have a good idea about how to best fix this? neal
Powered by blists - more mailing lists