lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <5f1bbeb2-efe4-0b10-bc76-37eff30ea905@uls.co.za> Date: Fri, 1 Apr 2022 01:06:26 +0200 From: Jaco Kroon <jaco@....co.za> To: Neal Cardwell <ncardwell@...gle.com>, Eric Dumazet <edumazet@...gle.com> Cc: LKML <linux-kernel@...r.kernel.org>, Netdev <netdev@...r.kernel.org>, Yuchung Cheng <ycheng@...gle.com> Subject: Re: linux 5.17.1 disregarding ACK values resulting in stalled TCP connections Hi Neal, This sniff was grabbed ON THE CLIENT HOST. There is no middlebox or anything between the sniffer and the client. Only the firewall on the host itself, where we've already establish the traffic is NOT DISCARDED (at least not in filter/INPUT). Setup on our end: 2 x routers, usually each with a direct peering with Google (which is being ignored at the moment so instead traffic is incoming via IPT over DD). Connected via switch to 2 x firewalls, of which ONE is active (they have different networks behind them, and could be active / standby for different networks behind them - avoiding active-active because conntrackd is causing more trouble than it's worth), Linux hosts, using netfilter, has been operating for years, no recent kernel upgrades. 4 x hosts in mail cluster, one of which you're looking at here. On 2022/03/31 17:41, Neal Cardwell wrote: > On Wed, Mar 30, 2022 at 9:04 AM Jaco Kroon <jaco@....co.za> wrote: > ... >> When you state sane/normal, do you mean there is fault with the other >> frames that could not be explained by packet loss in one or both of the >> directions? > Yes. > > (1) If you look at the attached trace time/sequence plots (from > tcptrace and xplot.org) there are several behaviors that do not look > like normal congestive packet loss: OK. I'm not 100% sure how these plots of yours work, but let's see if I can follow your logic here - they mostly make sense. A legend would probably help. As I understand the white dots are original transmits, green is what has been ACKED. R is retransmits ... what's the S? What's the yellow line (I'm guessing receive window as advertised by the server)? > > (a) Literally *all* original transmissions (white segments in the > plot) of packets after client sequence 66263 appear lost (are not > ACKed). Congestion generally does not behave like that. But broken > firewalls/middleboxes do. > (See netdev-2022-03-29-tcp-disregarded-acks-zoomed-out.png ) Agreed. So could it be that something in the transit path towards Google is actually dropping all of that? As stated - I highly doubt this is on our network unless newer kernel (on mail cluster) is doing stuff which is causing older netfilter to drop perhaps? But this doesn't explain why newer kernel retransmits data for which it received an ACK. > > (b) When the client is retransmitting packets, only packets at > exactly snd_una are ACKed. The packets beyond that point are always > un-ACKed. Again sounds like a broken firewall/middlebox. > (See netdev-2022-03-29-tcp-disregarded-acks-zoomed-in.png ) No middlebox between packet sniffer and client ... client here is linux 5.17.1. Brings me back to the only thing that could be dropping the traffic is netfilter on the host, or the kernel doesn't like something about the ACK, or kernel is doing something else wrong as a result of TFO. I'm not sure which option I like less. Unfortunately I also use netfilter for redirecting traffic into haproxy here so can't exactly just switch off netfilter. > > (c) After the client receives the server's "ack 73403", the client > ignores/drops all other incoming packets that show up in the trace. Agreed. However, if I read your graph correctly, it gets an ACK for frame X at ~3.8s into the connection, then for X+2 at 4s, but it keeps retransmitting X+2, not X+1? > > As Eric notes, this doesn't look like a PAWS issue. And it > doesn't look like a checksum or sequence/ACK validation issue. The > client starts ignoring ACKs between two ACKs that have correct > checksums, valid ACK numbers, and valid (identical) sequence numbers > and TS val and ecr values (here showing absolute sequence/ACK > numbers): I'm not familiar with PAWS here. Assuming that the green line is ACKs, then at around 4s we get an ACK that basically ACKs two frames in one (which is fine from my understanding of TCP), and then the second of these frames keeps getting retransmitted going forward, so it's almost like the kernel ACKs the *first* of these two frames but not the second. > > (i) The client processes this ACK and uses it to advance snd_una: > 17:46:49.889911 IP6 (flowlabel 0x97427, hlim 61, next-header TCP > (6) payload length: 32) 2a00:1450:4013:c16::1a.25 > > 2c0f:f720:0:3:d6ae:52ff:feb8:f27b.48590: . cksum 0x7005 (correct) > 2699968514:2699968514(0) ack 3451415932 win 830 <nop,nop,TS val > 1206546583 ecr 331191428> > > (ii) The client ignores this ACK and all later ACKs: > 17:46:49.889912 IP6 (flowlabel 0x97427, hlim 61, next-header TCP > (6) payload length: 32) 2a00:1450:4013:c16::1a.25 > > 2c0f:f720:0:3:d6ae:52ff:feb8:f27b.48590: . cksum 0x6a66 (correct) > 2699968514:2699968514(0) ack 3451417360 win 841 <nop,nop,TS val > 1206546583 ecr 331191428> > > neal
Powered by blists - more mailing lists