netdev - Re: linux 5.17.1 disregarding ACK values resulting in stalled TCP connections

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89i+Dqtrm-7oW+D6EY+nVPhRH07GXzDXt93WgzxZ1y9_tJA@mail.gmail.com>
Date:   Wed, 30 Mar 2022 09:19:53 -0700
From:   Eric Dumazet <edumazet@...gle.com>
To:     Jaco Kroon <jaco@....co.za>
Cc:     Neal Cardwell <ncardwell@...gle.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Netdev <netdev@...r.kernel.org>,
        Yuchung Cheng <ycheng@...gle.com>
Subject: Re: linux 5.17.1 disregarding ACK values resulting in stalled TCP connections

On Wed, Mar 30, 2022 at 9:04 AM Jaco Kroon <jaco@....co.za> wrote:
>
> Hi,
>
> On 2022/03/30 15:56, Neal Cardwell wrote:
> > On Wed, Mar 30, 2022 at 2:22 AM Jaco Kroon <jaco@....co.za> wrote:
> >> Hi Eric,
> >>
> >> On 2022/03/30 05:48, Eric Dumazet wrote:
> >>> On Tue, Mar 29, 2022 at 7:58 PM Jaco Kroon <jaco@....co.za> wrote:
> >>>
> >>> I do not think this commit is related to the issue you have.
> >>>
> >>> I guess you could try a revert ?
> >>>
> >>> Then, if you think old linux versions were ok, start a bisection ?
> >> That'll be interesting, will see if I can reproduce on a non-production
> >> host.
> >>> Thank you.
> >>>
> >>> (I do not see why a successful TFO would lead to a freeze after ~70 KB
> >>> of data has been sent)
> >> I do actually agree with this in that it makes no sense, but disabling
> >> TFO definitely resolved the issue for us.
> >>
> >> Kind Regards,
> >> Jaco
> > Thanks for the pcap trace! That's a pretty strange trace. I agree with
> > Eric's theory that this looks like one or more bugs in a firewall,
> > middlebox, or netfilter rule. From the trace it looks like the buggy
> > component is sometimes dropping packets and sometimes corrupting them
> > so that the client's TCP stack ignores them.
> The capture was taken on the client.  So the only firewall there is
> iptables, and I redirected all -j DROP statements to a L_DROP chain
> which did a -j LOG prior to -j DROP - didn't pick up any drops here.
> >
> > Interestingly, in that trace the client SYN has a TFO option and
> > cookie, but no data in the SYN.
>
> So this allows the SMTP server which in the conversation speaks first to
> identify itself to respond with data in the SYN (not sure that was
> actually happening but if I recall I did see it send data prior to
> receiving the final ACK on the handshake.
>
> >
> > The last packet that looks sane/normal is the ACK from the SMTP server
> > that looks like:
> >
> > 00:00:00.000010 IP6 2a00:1450:4013:c16::1a.25 >
> > 2c0f:f720:0:3:d6ae:52ff:feb8:f27b.48590: . 6260:6260(0) ack 66263 win
> > 774 <nop,nop,TS val 1206544341 ecr 331189186>
> >
> > That's the first ACK that crosses past 2^16. Maybe that is a
> > coincidence, or maybe not. Perhaps the buggy firewall/middlebox/etc is
>
> I believe it should be because we literally had this on every single
> connection going out to Google's SMTP ... probably 1/100 connections
> managed to deliver an email over the connection.  Then again ... 64KB
> isn't that much ...
>
> When you state sane/normal, do you mean there is fault with the other
> frames that could not be explained by packet loss in one or both of the
> directions?
>
> > confused by the TFO option, corrupts its state, and thereafter behaves
> > incorrectly past the first 64 KBytes of data from the client.
>
> Only firewalls we've got are netfilter based, and these packets all
> passed through the dedicated firewalls at least by the time they reach
> here.  No middleboxes on our end, and if this was Google's side there
> would be crazy noise be heard, not just me.  I think the trigger is
> packet loss between us (as indicated we know they have link congestion
> issues in JHB area, it took us the better part of two weeks to get the
> first line tech on their side to just query the internal teams and
> probably another week to get the response acknowledging this -
> mybroadband.co.za has an article about other local ISPs also complaining).
>
> >
> > In addition to checking for checksum failures, mentioned by Eric, you
> > could look for PAWS failures, something like:
> >
> >   nstat -az | egrep  -i 'TcpInCsumError|PAWS'
>
> TcpInCsumErrors                 0                  0.0
> TcpExtPAWSActive                0                  0.0
> TcpExtPAWSEstab                 90092              0.0
> TcpExtTCPACKSkippedPAWS         81317              0.0
>
> Not sure what these mean, but i should probably investigate, the latter
> two are definitely incrementing.
>
> Appreciate the feedback and for looking at the traces.
>

Your pcap does not show any obvious PAWS issues.

If the host is lightly loaded you could try while the connection is
attempted/frozen

perf record -a -g -e skb:kfree_skb sleep 30
perf script  (or perf report)