[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADVnQyn=MXohOf1vskJcm9VTOeP31Y5AqCPu7B=zZuTB8nh8Eg@mail.gmail.com>
Date: Tue, 20 May 2025 23:04:40 -0400
From: Neal Cardwell <ncardwell@...gle.com>
To: Simon Campion <simon.campion@...pl.com>
Cc: netdev@...r.kernel.org, Eric Dumazet <edumazet@...gle.com>,
Yuchung Cheng <ycheng@...gle.com>, Kevin Yang <yyd@...gle.com>, Jon Maloy <jmaloy@...hat.com>
Subject: Re: [EXT] Re: tcp: socket stuck with zero receive window after SACK
cc += Jon Maloy <jmaloy@...hat.com>
On Mon, May 19, 2025 at 11:03 AM Simon Campion <simon.campion@...pl.com> wrote:
>
> Gladly! I attached the output of nstat -az. I ran it twice, right
> before a 602 byte retransmit was received and dropped, and right
> after, in case looking at the diff is helpful.
Thanks, Simon, for the data!
Skimming the data and the code for your kernel (6.6.83), I have a theory:
In your nstat data, we see TcpExtTCPZeroWindowDrop is incremented by 1
when the 602 byte retransmit was received and dropped:
> < TcpExtTCPZeroWindowDrop 485489 0.0
> ---
> > TcpExtTCPZeroWindowDrop 485490 0.0
That SNMP stat (corresponding to the SKB_DROP_REASON_TCP_ZEROWINDOW
drop reason Simon mentioned earlier) is incremented by
tcp_data_queue() when an in-order packet arrives and
tcp_receive_window(tp) == 0, and the packet is dropped.
But, critically, tcp_data_queue() in that code path does not call
tcp_try_rmem_schedule() to try to free up memory.
Why is tcp_receive_window(tp) == 0 in this case? A conjecture:
(a) I bet the machine was probably under memory pressure earlier,
triggering ICSK_ACK_NOMEM
(b) We can see your kernel 6.6.83 has a backport of the recent bug fix
patch that sets tp->rcv_wnd = 0 upon ICSK_ACK_NOMEM events:
commit b01e7ceb35dcb7ffad413da657b78c3340a09039
Author: Jon Maloy <jmaloy@...hat.com>
Date: Mon Jan 27 18:13:04 2025 -0500
tcp: correct handling of extreme memory squeeze
[ Upstream commit 8c670bdfa58e48abad1d5b6ca1ee843ca91f7303 ]
...
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index cfddc94508f0b..3771ed22c2f56 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -263,11 +263,14 @@ static u16 tcp_select_window(struct sock *sk)
u32 cur_win, new_win;
/* Make the window 0 if we failed to queue the data because we
- * are out of memory. The window is temporary, so we don't store
- * it on the socket.
+ * are out of memory.
*/
- if (unlikely(inet_csk(sk)->icsk_ack.pending & ICSK_ACK_NOMEM))
+ if (unlikely(inet_csk(sk)->icsk_ack.pending & ICSK_ACK_NOMEM)) {
+ tp->pred_flags = 0;
+ tp->rcv_wnd = 0;
+ tp->rcv_wup = tp->rcv_nxt;
return 0;
+ }
---
Putting this all together, a conjecture about what happened:
+ the machine was under memory pressure, so triggered ICSK_ACK_NOMEM
+ this caused the new "tcp: correct handling of extreme memory
squeeze" patch to set tp->rcv_wnd = 0
+ this caused tcp_data_queue() to see the in-order packet arrive and
tcp_receive_window(tp) == 0, and the packet is dropped.with
TcpExtTCPZeroWindowDrop
+ tcp_data_queue() in that code path does not call
tcp_try_rmem_schedule() to try to free up memory
+ so even if more memory was available at this point,
tcp_try_rmem_schedule() is not called, because of the new "tcp:
correct handling of extreme memory squeeze" patch
I suppose one possible fix would be to change tcp_data_queue() in that
(tcp_receive_window(tp) == 0) case, to make sure it calls
tcp_try_rmem_schedule() to try to free up memory.
Eric and Jon, WDYT?
It's a bit past my bedtime here in NYC so I may not be thinking straight.... :-)
thanks,
neal
Powered by blists - more mailing lists