netdev - Re: linux 5.17.1 disregarding ACK values resulting in stalled TCP connections

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5f1bbeb2-efe4-0b10-bc76-37eff30ea905@uls.co.za>
Date:   Fri, 1 Apr 2022 01:06:26 +0200
From:   Jaco Kroon <jaco@....co.za>
To:     Neal Cardwell <ncardwell@...gle.com>,
        Eric Dumazet <edumazet@...gle.com>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Netdev <netdev@...r.kernel.org>,
        Yuchung Cheng <ycheng@...gle.com>
Subject: Re: linux 5.17.1 disregarding ACK values resulting in stalled TCP
 connections

Hi Neal,

This sniff was grabbed ON THE CLIENT HOST.  There is no middlebox or
anything between the sniffer and the client.  Only the firewall on the
host itself, where we've already establish the traffic is NOT DISCARDED
(at least not in filter/INPUT).

Setup on our end:

2 x routers, usually each with a direct peering with Google (which is
being ignored at the moment so instead traffic is incoming via IPT over DD).

Connected via switch to

2 x firewalls, of which ONE is active (they have different networks
behind them, and could be active / standby for different networks behind
them - avoiding active-active because conntrackd is causing more trouble
than it's worth), Linux hosts, using netfilter, has been operating for
years, no recent kernel upgrades.

4 x hosts in mail cluster, one of which you're looking at here.

On 2022/03/31 17:41, Neal Cardwell wrote:
> On Wed, Mar 30, 2022 at 9:04 AM Jaco Kroon <jaco@....co.za> wrote:
> ...
>> When you state sane/normal, do you mean there is fault with the other
>> frames that could not be explained by packet loss in one or both of the
>> directions?
> Yes.
>
> (1) If you look at the attached trace time/sequence plots (from
> tcptrace and xplot.org) there are several behaviors that do not look
> like normal congestive packet loss:
OK.  I'm not 100% sure how these plots of yours work, but let's see if I
can follow your logic here - they mostly make sense.  A legend would
probably help.  As I understand the white dots are original transmits,
green is what has been ACKED.  R is retransmits ... what's the S? 
What's the yellow line (I'm guessing receive window as advertised by the
server)?
>
>   (a) Literally *all* original transmissions (white segments in the
> plot) of packets after client sequence 66263 appear lost (are not
> ACKed). Congestion generally does not behave like that. But broken
> firewalls/middleboxes do.
>        (See netdev-2022-03-29-tcp-disregarded-acks-zoomed-out.png )

Agreed.  So could it be that something in the transit path towards
Google is actually dropping all of that?

As stated - I highly doubt this is on our network unless newer kernel
(on mail cluster) is doing stuff which is causing older netfilter to
drop perhaps?  But this doesn't explain why newer kernel retransmits
data for which it received an ACK.

>
>   (b) When the client is retransmitting packets, only packets at
> exactly snd_una are ACKed. The packets beyond that point are always
> un-ACKed. Again sounds like a broken firewall/middlebox.
>        (See netdev-2022-03-29-tcp-disregarded-acks-zoomed-in.png )
No middlebox between packet sniffer and client ... client here is linux
5.17.1.  Brings me back to the only thing that could be dropping the
traffic is netfilter on the host, or the kernel doesn't like something
about the ACK, or kernel is doing something else wrong as a result of
TFO.  I'm not sure which option I like less.  Unfortunately I also use
netfilter for redirecting traffic into haproxy here so can't exactly
just switch off netfilter.
>
>   (c) After the client receives the server's "ack 73403", the client
> ignores/drops all other incoming packets that show up in the trace.

Agreed.  However, if I read your graph correctly, it gets an ACK for
frame X at ~3.8s into the connection, then for X+2 at 4s, but it keeps
retransmitting X+2, not X+1?


>
>        As Eric notes, this doesn't look like a PAWS issue. And it
> doesn't look like a checksum or sequence/ACK validation issue. The
> client starts ignoring ACKs between two ACKs that have correct
> checksums, valid ACK numbers, and valid (identical) sequence numbers
> and TS val and ecr values (here showing absolute sequence/ACK
> numbers):
I'm not familiar with PAWS here.  Assuming that the green line is ACKs,
then at around 4s we get an ACK that basically ACKs two frames in one
(which is fine from my understanding of TCP), and then the second of
these frames keeps getting retransmitted going forward, so it's almost
like the kernel ACKs the *first* of these two frames but not the second.
>
>     (i) The client processes this ACK and uses it to advance snd_una:
>     17:46:49.889911 IP6 (flowlabel 0x97427, hlim 61, next-header TCP
> (6) payload length: 32) 2a00:1450:4013:c16::1a.25 >
> 2c0f:f720:0:3:d6ae:52ff:feb8:f27b.48590: . cksum 0x7005 (correct)
> 2699968514:2699968514(0) ack 3451415932 win 830 <nop,nop,TS val
> 1206546583 ecr 331191428>

>
>     (ii) The client ignores this ACK and all later ACKs:
>     17:46:49.889912 IP6 (flowlabel 0x97427, hlim 61, next-header TCP
> (6) payload length: 32) 2a00:1450:4013:c16::1a.25 >
> 2c0f:f720:0:3:d6ae:52ff:feb8:f27b.48590: . cksum 0x6a66 (correct)
> 2699968514:2699968514(0) ack 3451417360 win 841 <nop,nop,TS val
> 1206546583 ecr 331191428>
>
> neal