lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 30 Mar 2022 09:56:45 -0400
From:   Neal Cardwell <ncardwell@...gle.com>
To:     Jaco Kroon <jaco@....co.za>
Cc:     Eric Dumazet <edumazet@...gle.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Netdev <netdev@...r.kernel.org>,
        Yuchung Cheng <ycheng@...gle.com>
Subject: Re: linux 5.17.1 disregarding ACK values resulting in stalled TCP connections

On Wed, Mar 30, 2022 at 2:22 AM Jaco Kroon <jaco@....co.za> wrote:
>
> Hi Eric,
>
> On 2022/03/30 05:48, Eric Dumazet wrote:
> > On Tue, Mar 29, 2022 at 7:58 PM Jaco Kroon <jaco@....co.za> wrote:
> >
> > I do not think this commit is related to the issue you have.
> >
> > I guess you could try a revert ?
> >
> > Then, if you think old linux versions were ok, start a bisection ?
> That'll be interesting, will see if I can reproduce on a non-production
> host.
> >
> > Thank you.
> >
> > (I do not see why a successful TFO would lead to a freeze after ~70 KB
> > of data has been sent)
>
> I do actually agree with this in that it makes no sense, but disabling
> TFO definitely resolved the issue for us.
>
> Kind Regards,
> Jaco

Thanks for the pcap trace! That's a pretty strange trace. I agree with
Eric's theory that this looks like one or more bugs in a firewall,
middlebox, or netfilter rule. From the trace it looks like the buggy
component is sometimes dropping packets and sometimes corrupting them
so that the client's TCP stack ignores them.

Interestingly, in that trace the client SYN has a TFO option and
cookie, but no data in the SYN.

The last packet that looks sane/normal is the ACK from the SMTP server
that looks like:

00:00:00.000010 IP6 2a00:1450:4013:c16::1a.25 >
2c0f:f720:0:3:d6ae:52ff:feb8:f27b.48590: . 6260:6260(0) ack 66263 win
774 <nop,nop,TS val 1206544341 ecr 331189186>

That's the first ACK that crosses past 2^16. Maybe that is a
coincidence, or maybe not. Perhaps the buggy firewall/middlebox/etc is
confused by the TFO option, corrupts its state, and thereafter behaves
incorrectly past the first 64 KBytes of data from the client.

In addition to checking for checksum failures, mentioned by Eric, you
could look for PAWS failures, something like:

  nstat -az | egrep  -i 'TcpInCsumError|PAWS'

best,
neal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ