lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <CANn89iKHbmVYoBdo2pCQWTzB4eFBjqAMdFbqL5EKSFqgg3uAJQ@mail.gmail.com> Date: Tue, 29 Mar 2022 20:48:02 -0700 From: Eric Dumazet <edumazet@...gle.com> To: Jaco Kroon <jaco@....co.za> Cc: Neal Cardwell <ncardwell@...gle.com>, LKML <linux-kernel@...r.kernel.org>, Netdev <netdev@...r.kernel.org>, Yuchung Cheng <ycheng@...gle.com> Subject: Re: linux 5.17.1 disregarding ACK values resulting in stalled TCP connections On Tue, Mar 29, 2022 at 7:58 PM Jaco Kroon <jaco@....co.za> wrote: > > Hi Neal, > > > Thanks for the report! I have CC-ed the netdev list, since it is > > probably a better forum for this discussion. > Awesome thank you. > > > > Can you please attach (or link to) a tcpdump raw .pcap file (produced > > with the -w flag)? There are a number of tools that will make this > > easier to visualize and analyze if we can see the raw .pcap file. You > > may want to anonymize the trace and/or capture just headers, etc (for > > example, the -s flag can control how much of each packet tcpdump > > grabs). > > Attached. > > The traffic itself should be mostly encrypted but stripped with -s100 > anyway. At this point SACK was still on. > > I don't know how, or why, but this relates to TFO. After sending report > on a hunch (based on comparing the exim logs of a successful delivery > compared to a non-successful) and the only difference was that the > non-working was stating: > > TFO mode sendto, no data: EINPROGRESS > > and then specifically: > > TCP_FASTOPEN tcpi_unacked 2 > > The working connections never had the latter line in the output. > > The moment I set sysctl -w net.ipv4.tcp_fastopen=0 (default is 1) I've > managed to flood out about 1200 emails to google in a matter of no more > than 15 minutes. > > In the kernel sources: git log v5.8..v5.17 net/ > > And searching for TFO only gives so many possible commits that broke > this, just looking at changelogs I'm not sure if any of them are > relevant. I'm guessing the issue possibly relates to congestion > control, as such this is probably the most relevant: > > commit be5d1b61a2ad28c7e57fe8bfa277373e8ecffcdc > Author: Nguyen Dinh Phi <phind.uet@...il.com> > Date: Tue Jul 6 07:19:12 2021 +0800 > > tcp: fix tcp_init_transfer() to not reset icsk_ca_initialized > > Just looking at the diff it removes a icsk->icsk_ca_initialized = 0; - > the only other place this gets set to 0 is in tcp_disconnect() ... and > to 1 in tcp_init_congestion_control() - so I think we might have an > uninitialized variable here ... then again tcp_init_socket mentions > explicitly that sk_alloc set lots of stuff to 0 - still bugs me that the > original commit (8919a9b31eb4) felt the need to set an explicit 0 in > tcp_init_transfer(). I do not think this commit is related to the issue you have. I guess you could try a revert ? Then, if you think old linux versions were ok, start a bisection ? Thank you. (I do not see why a successful TFO would lead to a freeze after ~70 KB of data has been sent) > > > > > Can you please share the exact kernel version of the client machine? > Our side (client) is 5.17.1 (side that initiates TCP/IP connection), I > obviously can't comment for the Google side (server). > > Also, can you please summarize/clarify whether you think the client, > > server, or both are misbehaving? > > client is re-transmitting frames for which it has already received an > ACK from the server. In pcap from frames 105 onwards one can start > seeing retransmits, then first "spurious retransmission" as wireshark > labels it from frames 122 onwards. > > Kind Regards, > Jaco
Powered by blists - more mailing lists