lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 29 Mar 2022 20:48:02 -0700
From:   Eric Dumazet <>
To:     Jaco Kroon <>
Cc:     Neal Cardwell <>,
        LKML <>,
        Netdev <>,
        Yuchung Cheng <>
Subject: Re: linux 5.17.1 disregarding ACK values resulting in stalled TCP connections

On Tue, Mar 29, 2022 at 7:58 PM Jaco Kroon <> wrote:
> Hi Neal,
> > Thanks for the report!  I have CC-ed the netdev list, since it is
> > probably a better forum for this discussion.
> Awesome thank you.
> >
> > Can you please attach (or link to) a tcpdump raw .pcap file  (produced
> > with the -w flag)? There are a number of tools that will make this
> > easier to visualize and analyze if we can see the raw .pcap file. You
> > may want to anonymize the trace and/or capture just headers, etc (for
> > example, the -s flag can control how much of each packet tcpdump
> > grabs).
> Attached.
> The traffic itself should be mostly encrypted but stripped with -s100
> anyway.  At this point SACK was still on.
> I don't know how, or why, but this relates to TFO.  After sending report
> on a hunch (based on comparing the exim logs of a successful delivery
> compared to a non-successful) and the only difference was that the
> non-working was stating:
> TFO mode sendto, no data: EINPROGRESS
> and then specifically:
> TCP_FASTOPEN tcpi_unacked 2
> The working connections never had the latter line in the output.
> The moment I set sysctl -w net.ipv4.tcp_fastopen=0 (default is 1) I've
> managed to flood out about 1200 emails to google in a matter of no more
> than 15 minutes.
> In the kernel sources:  git log v5.8..v5.17 net/
> And searching for TFO only gives so many possible commits that broke
> this, just looking at changelogs I'm not sure if any of them are
> relevant.  I'm guessing the issue possibly relates to congestion
> control, as such this is probably the most relevant:
> commit be5d1b61a2ad28c7e57fe8bfa277373e8ecffcdc
> Author: Nguyen Dinh Phi <>
> Date:   Tue Jul 6 07:19:12 2021 +0800
>     tcp: fix tcp_init_transfer() to not reset icsk_ca_initialized
> Just looking at the diff it removes a icsk->icsk_ca_initialized = 0; -
> the only other place this gets set to 0 is in tcp_disconnect() ... and
> to 1 in tcp_init_congestion_control() - so I think we might have an
> uninitialized variable here ... then again tcp_init_socket mentions
> explicitly that sk_alloc set lots of stuff to 0 - still bugs me that the
> original commit (8919a9b31eb4) felt the need to set an explicit 0 in
> tcp_init_transfer().

I do not think this commit is related to the issue you have.

I guess you could try a revert ?

Then, if you think old linux versions were ok, start a bisection ?

Thank you.

(I do not see why a successful TFO would lead to a freeze after ~70 KB
of data has been sent)

> >
> > Can you please share the exact kernel version of the client machine?
> Our side (client) is 5.17.1 (side that initiates TCP/IP connection), I
> obviously can't comment for the Google side (server).
> > Also, can you please summarize/clarify whether you think the client,
> > server, or both are misbehaving?
> client is re-transmitting frames for which it has already received an
> ACK from the server.  In pcap from frames 105 onwards one can start
> seeing retransmits, then first "spurious retransmission" as wireshark
> labels it from frames 122 onwards.
> Kind Regards,
> Jaco

Powered by blists - more mailing lists