netdev - Re: linux 5.17.1 disregarding ACK values resulting in stalled TCP connections

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <eaf54cab-f852-1499-95e2-958af8be7085@uls.co.za>
Date:   Wed, 30 Mar 2022 04:58:04 +0200
From:   Jaco Kroon <jaco@....co.za>
To:     Neal Cardwell <ncardwell@...gle.com>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Netdev <netdev@...r.kernel.org>,
        Eric Dumazet <edumazet@...gle.com>,
        Yuchung Cheng <ycheng@...gle.com>
Subject: Re: linux 5.17.1 disregarding ACK values resulting in stalled TCP
 connections

Hi Neal,

> Thanks for the report!  I have CC-ed the netdev list, since it is
> probably a better forum for this discussion.
Awesome thank you.
>
> Can you please attach (or link to) a tcpdump raw .pcap file  (produced
> with the -w flag)? There are a number of tools that will make this
> easier to visualize and analyze if we can see the raw .pcap file. You
> may want to anonymize the trace and/or capture just headers, etc (for
> example, the -s flag can control how much of each packet tcpdump
> grabs).

Attached.

The traffic itself should be mostly encrypted but stripped with -s100
anyway.  At this point SACK was still on.

I don't know how, or why, but this relates to TFO.  After sending report
on a hunch (based on comparing the exim logs of a successful delivery
compared to a non-successful) and the only difference was that the
non-working was stating:

TFO mode sendto, no data: EINPROGRESS

and then specifically:

TCP_FASTOPEN tcpi_unacked 2

The working connections never had the latter line in the output.

The moment I set sysctl -w net.ipv4.tcp_fastopen=0 (default is 1) I've
managed to flood out about 1200 emails to google in a matter of no more
than 15 minutes.

In the kernel sources:  git log v5.8..v5.17 net/

And searching for TFO only gives so many possible commits that broke
this, just looking at changelogs I'm not sure if any of them are
relevant.  I'm guessing the issue possibly relates to congestion
control, as such this is probably the most relevant:

commit be5d1b61a2ad28c7e57fe8bfa277373e8ecffcdc
Author: Nguyen Dinh Phi <phind.uet@...il.com>
Date:   Tue Jul 6 07:19:12 2021 +0800

    tcp: fix tcp_init_transfer() to not reset icsk_ca_initialized

Just looking at the diff it removes a icsk->icsk_ca_initialized = 0; -
the only other place this gets set to 0 is in tcp_disconnect() ... and
to 1 in tcp_init_congestion_control() - so I think we might have an
uninitialized variable here ... then again tcp_init_socket mentions
explicitly that sk_alloc set lots of stuff to 0 - still bugs me that the
original commit (8919a9b31eb4) felt the need to set an explicit 0 in
tcp_init_transfer().

>
> Can you please share the exact kernel version of the client machine?
Our side (client) is 5.17.1 (side that initiates TCP/IP connection), I
obviously can't comment for the Google side (server).
> Also, can you please summarize/clarify whether you think the client,
> server, or both are misbehaving?

client is re-transmitting frames for which it has already received an
ACK from the server.  In pcap from frames 105 onwards one can start
seeing retransmits, then first "spurious retransmission" as wireshark
labels it from frames 122 onwards.

Kind Regards,
Jaco

Download attachment "iewc_google2.pcap" of type "application/vnd.tcpdump.pcap" (19828 bytes)