[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <SJ0PR84MB18471A151DA83FD70075B354B27A9@SJ0PR84MB1847.NAMPRD84.PROD.OUTLOOK.COM>
Date: Fri, 2 Sep 2022 14:19:42 +0000
From: "Arankal, Nagaraj" <nagaraj.p.arankal@....com>
To: Neal Cardwell <ncardwell@...gle.com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Eric Dumazet <edumazet@...gle.com>,
Yuchung Cheng <ycheng@...gle.com>
Subject: RE: retrans_stamp not cleared while testing NewReno implementation.
Hi Neal,
Thanks a lot.
I shall update my Kernel with proposed fix and run the tests again.
Regards,
Nagaraj P Arankal
-----Original Message-----
From: Neal Cardwell <ncardwell@...gle.com>
Sent: Friday, September 2, 2022 7:25 PM
To: Arankal, Nagaraj <nagaraj.p.arankal@....com>
Cc: netdev@...r.kernel.org; Eric Dumazet <edumazet@...gle.com>; Yuchung Cheng <ycheng@...gle.com>
Subject: Re: retrans_stamp not cleared while testing NewReno implementation.
"
On Fri, Sep 2, 2022 at 6:29 AM Arankal, Nagaraj <nagaraj.p.arankal@....com> wrote:
>
> While testing newReno implementation on 4.19.197 based debian kernel, NewReno(SACK disabled) with connections that have a very low traffic, we may timeout the connection too early if a second loss occurs after the first one was successfully acked but no data was transferred later. Below is his description of it:
>
> When SACK is disabled, and a socket suffers multiple separate TCP retransmissions, that socket's ETIMEDOUT value is calculated from the time of the *first* retransmission instead of the *latest* retransmission.
>
> This happens because the tcp_sock's retrans_stamp is set once then never cleared.
>
> Take the following connection:
>
>
> (*1) One data packet sent.
> (*2) Because no ACK packet is received, the packet is retransmitted.
> (*3) The ACK packet is received. The transmitted packet is acknowledged.
>
> At this point the first "retransmission event" has passed and been recovered from. Any future retransmission is a completely new "event".
>
> (*4) After 16 minutes (to correspond with tcp_retries2=15), a new data packet is sent. Note: No data is transmitted between (*3) and (*4) and we disabled keep alives.
>
> The socket's timeout SHOULD be calculated from this point in time, but instead it's calculated from the prior "event" 16 minutes ago.
>
> (*5) Because no ACK packet is received, the packet is retransmitted.
> (*6) At the time of the 2nd retransmission, the socket returns ETIMEDOUT.
>
> From the history I came to know that there was a fix included, which would resolve above issue. Please find below patch.
>
> static bool tcp_try_undo_recovery(struct sock *sk)
> * is ACKed. For Reno it is MUST to prevent false
> * fast retransmits (RFC2582). SACK TCP is safe. */
> tcp_moderate_cwnd(tp);
> + if (!tcp_any_retrans_done(sk))
> + tp->retrans_stamp = 0;
> return true;
> }
>
> However, after introducing following fix,
>
> [net,1/2] tcp: only undo on partial ACKs in CA_Loss
>
> I am not able to see retrains_stamp reset to Zero.
> Inside tcp_process_loss , we are returning from below code path.
>
> if ((flag & FLAG_SND_UNA_ADVANCED) &&
> tcp_try_undo_loss(sk, false))
> return;
> because of which tp->retrans_stamp is never cleared as we failed to invoke tcp_try_undo_recovery.
>
> Is this a known bug in kernel code or is it an expected behavior.
>
>
> - Thanks in advance,
> Nagaraj
Thanks for the detailed bug report and analysis! I agree that
"tcp: only undo on partial ACKs in CA_Loss" introduced the bug that you are analyzing.
I suspect we need a fix along the lines below. I will try to create a packetdrill test to reproduce this and verify the fix below, and will run this fix through our existing packetdrill tests.
Thanks!
commit d2f706c1be7e9822a99477edd69bc13ddd00557f
Author: Neal Cardwell <ncardwell@...gle.com>
Date: Fri Sep 2 09:36:23 2022 -0400
tcp: fix early ETIMEDOUT after spurious non-SACK RTO
Fix a bug reported and analyzed by Nagaraj Arankal, where the handling
of a spurious non-SACK RTO could cause a connection to fail to clear
retrans_stamp, causing a later RTO to very prematurely time out the
connection with ETIMEDOUT.
Here is the buggy scenario, expanding upon Nagaraj Arankal's excellent
report:
(*1) Send one data packet on a non-SACK connection
(*2) Because no ACK packet is received, the packet is retransmitted
and we enter CA_Loss; but this retransmission is spurious.
(*3) The ACK for the original data is received. The transmitted packet
is acknowledged. The TCP timestamp is before the retrans_stamp,
so tcp_may_undo() returns true, and tcp_try_undo_loss() returns
true without changing state to Open (because tcp_is_sack() is
false), and tcp_process_loss() returns without calling
tcp_try_undo_recovery(). Normally after undoing a CA_Loss
episode, tcp_fastretrans_alert() would see that the connection
has returned to CA_Open and fall through and call
tcp_try_to_open(), which would set retrans_stamp to 0. However,
for non-SACK connections we hold the connection in CA_Loss, so do
not fall through to call tcp_try_to_open() and do not set
retrans_stamp to 0. So retrans_stamp is (erroneously) still
non-zero.
At this point the first "retransmission event" has passed and
been recovered from. Any future retransmission is a completely
new "event". However, retrans_stamp is erroneously still
set. (And we are still in CA_Loss, which is correct.)
(*4) After 16 minutes (to correspond with tcp_retries2=15), a new data
packet is sent. Note: No data is transmitted between (*3) and
(*4) and we disabled keep alives.
The socket's timeout SHOULD be calculated from this point in
time, but instead it's calculated from the prior "event" 16
minutes ago (step (*2)).
(*5) Because no ACK packet is received, the packet is retransmitted.
(*6) At the time of the 2nd retransmission, the socket returns
ETIMEDOUT, prematurely, because retrans_stamp is (erroneously)
too far in the past (set at the time of (*2)).
This commit fixes this bug by ensuring that we reuse in
tcp_try_undo_loss() the same careful logic for non-SACK connections
that we have in tcp_try_undo_recovery(). To avoid duplicating logic,
we factor out that logic into a new
tcp_is_non_sack_preventing_reopen() helper and call that helper from
both undo functions.
Fixes: da34ac7626b5 ("tcp: only undo on partial ACKs in CA_Loss")
Reported-by: Nagaraj Arankal <nagaraj.p.arankal@....com>
Signed-off-by: Neal Cardwell <ncardwell@...gle.com>
Change-Id: Ie58ea40bdbfe0643111a17a41eda0674f62ce76d
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index b85a9f755da41..bc2ea12221f95 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2513,6 +2513,21 @@ static inline bool tcp_may_undo(const struct tcp_sock *tp)
return tp->undo_marker && (!tp->undo_retrans || tcp_packet_delayed(tp)); }
+static bool tcp_is_non_sack_preventing_reopen(struct sock *sk) {
+ struct tcp_sock *tp = tcp_sk(sk);
+
+ if (tp->snd_una == tp->high_seq && tcp_is_reno(tp)) {
+ /* Hold old state until something *above* high_seq
+ * is ACKed. For Reno it is MUST to prevent false
+ * fast retransmits (RFC2582). SACK TCP is safe. */
+ if (!tcp_any_retrans_done(sk))
+ tp->retrans_stamp = 0;
+ return true;
+ }
+ return false;
+}
+
/* People celebrate: "We love our President!" */ static bool tcp_try_undo_recovery(struct sock *sk) { @@ -2535,14 +2550,8 @@ static bool tcp_try_undo_recovery(struct sock *sk)
} else if (tp->rack.reo_wnd_persist) {
tp->rack.reo_wnd_persist--;
}
- if (tp->snd_una == tp->high_seq && tcp_is_reno(tp)) {
- /* Hold old state until something *above* high_seq
- * is ACKed. For Reno it is MUST to prevent false
- * fast retransmits (RFC2582). SACK TCP is safe. */
- if (!tcp_any_retrans_done(sk))
- tp->retrans_stamp = 0;
+ if (tcp_is_non_sack_preventing_reopen(sk))
return true;
- }
tcp_set_ca_state(sk, TCP_CA_Open);
tp->is_sack_reneg = 0;
return false;
@@ -2578,6 +2587,8 @@ static bool tcp_try_undo_loss(struct sock *sk, bool frto_undo)
NET_INC_STATS(sock_net(sk),
LINUX_MIB_TCPSPURIOUSRTOS);
inet_csk(sk)->icsk_retransmits = 0;
+ if (tcp_is_non_sack_preventing_reopen(sk))
+ return true;
if (frto_undo || tcp_is_sack(tp)) {
tcp_set_ca_state(sk, TCP_CA_Open);
tp->is_sack_reneg = 0;
neal
Powered by blists - more mailing lists