[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADVnQy=GX0J_QbMJXogGzPwD=f0diKDDxLiHV0gzrb4bo=4FjA@mail.gmail.com>
Date: Fri, 1 Apr 2022 11:39:24 -0400
From: Neal Cardwell <ncardwell@...gle.com>
To: Jaco Kroon <jaco@....co.za>
Cc: Eric Dumazet <edumazet@...gle.com>,
LKML <linux-kernel@...r.kernel.org>,
Netdev <netdev@...r.kernel.org>,
Yuchung Cheng <ycheng@...gle.com>, Wei Wang <weiwan@...gle.com>
Subject: Re: linux 5.17.1 disregarding ACK values resulting in stalled TCP connections
On Tue, Mar 29, 2022 at 9:03 PM Jaco <jaco@....co.za> wrote:
...
> Connection setup:
>
> 00:56:17.055481 IP6 2c0f:f720:0:3:d6ae:52ff:feb8:f27b.59110 > 2a00:1450:400c:c07::1b.25: Flags [S], seq 956633779, win 62580, options [mss 8940,nop,nop,TS val 3687705482 ecr 0,nop,wscale 7,tfo cookie f025dd84b6122510,nop,nop], length 0
>
> 00:56:17.217747 IP6 2a00:1450:400c:c07::1b.25 > 2c0f:f720:0:3:d6ae:52ff:feb8:f27b.59110: Flags [S.], seq 726465675, ack 956633780, win 65535, options [mss 1440,nop,nop,TS val 3477429218 ecr 3687705482,nop,wscale 8], length 0
>
> 00:56:17.218628 IP6 2a00:1450:400c:c07::1b.25 > 2c0f:f720:0:3:d6ae:52ff:feb8:f27b.59110: Flags [P.], seq 726465676:726465760, ack 956633780, win 256, options [nop,nop,TS val 3477429220 ecr 3687705482], length 84: SMTP: 220 mx.google.com ESMTP e16-20020a05600c4e5000b0038c77be9b2dsi226281wmq.72 - gsmtp
>
> 00:56:17.218663 IP6 2c0f:f720:0:3:d6ae:52ff:feb8:f27b.59110 > 2a00:1450:400c:c07::1b.25: Flags [.], ack 726465760, win 489, options [nop,nop,TS val 3687705645 ecr 3477429220], length 0
>
> This is pretty normal, we advertise an MSS of 8940 and the return is 1440, thus
> we shouldn't send segments larger than that, and they "can't". I need to
> determine if this is some form of offloading or they really are sending >1500
> byte frames (which I know won't pass our firewalls without fragmentation so
> probably some form of NIC offloading - which if it was active on older 5.8
> kernels did not cause problems):
Jaco, was there some previous kernel version on these client machines
where this problem did not show up? Perhaps the v5.8 version you
mention here? Can you please share the exact version number?
If so, a hypothesis would be:
(1) There is a bug in netfilter's handling of TFO connections where
the server sends a data packet after a TFO SYNACK, before the client
ACKs anything (as we see in this trace).
This bug is perhaps similar in character to the bug fixed by Yuchung's
2013 commit that Eric mentioned:
356d7d88e088687b6578ca64601b0a2c9d145296
netfilter: nf_conntrack: fix tcp_in_window for Fast Open
(2) With kernel v5.8, TFO blackhole detection detected that in your
workload there were TFO connections that died due to apparent
blackholing (like what's shown in the trace), and dynamically disabled
TFO on your machines. This allowed mail traffic to flow, because the
netfilter bug was no longer tickled. This worked around the netfilter
bug.
(3) You upgraded your client-side machine from v5.8 to v5.17, which
has the following commit from v5.14, which disables TFO blackhole
logic by default:
213ad73d0607 tcp: disable TFO blackhole logic by default
(4) Due to (3), the blackhole detection logic was no longer operative,
and when the netfilter bug blackholed the connection, TFO stayed
enabled. This caused mail traffic to Google to stall.
This hypothesis would explain why:
o disabling TFO fixes this problem
o you are seeing this with a newer kernel (and apparently not with a
kernel before v5.14?)
With this hypothesis, we need several pieces to trigger this:
(a) client side software that tries TFO to a server that supports TFO
(like the exim mail transfer agent you are using, connecting to
Google)
(b) a client-side Linux kernel running buggy netfilter code (you are
running netfilter)
(c) a client-side Linux kernel with TFO support but no blackhole
detection logic active (e.g. v5.14 or later, like your v5.17.1)
That's probably a rare combination, so would explain why we have not
had this report before.
Jaco, to provide some evidence for this hypothesis, can you please
re-enable fastopen but also enable the TFO blackhole detection that
was disabled in v5.14 (213ad73d0607), with something like:
sysctl -w net.ipv4.tcp_fastopen=1
sysctl -w tcp_fastopen_blackhole_timeout=3600
And then after a few hours, check to see if this blackholing behavior
has been detected:
nstat -az | grep -i blackhole
And see if TFO FastOpenActive attempts have been cut to a super-low rate:
nstat -az | grep -i fastopenactive
thanks,
neal
Powered by blists - more mailing lists