[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <SJ0PR84MB1847BE6C24D274C46A1B9B0EB27A9@SJ0PR84MB1847.NAMPRD84.PROD.OUTLOOK.COM>
Date: Fri, 2 Sep 2022 10:29:09 +0000
From: "Arankal, Nagaraj" <nagaraj.p.arankal@....com>
To: "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: retrans_stamp not cleared while testing NewReno implementation.
While testing newReno implementation on 4.19.197 based debian kernel, NewReno(SACK disabled) with connections that have a very low traffic, we may timeout the connection too early if a second loss occurs after the first one was successfully acked but no data was transferred later. Below is his description of it:
When SACK is disabled, and a socket suffers multiple separate TCP retransmissions, that socket's ETIMEDOUT value is calculated from the time of the *first* retransmission instead of the *latest* retransmission.
This happens because the tcp_sock's retrans_stamp is set once then never cleared.
Take the following connection:
(*1) One data packet sent.
(*2) Because no ACK packet is received, the packet is retransmitted.
(*3) The ACK packet is received. The transmitted packet is acknowledged.
At this point the first "retransmission event" has passed and been recovered from. Any future retransmission is a completely new "event".
(*4) After 16 minutes (to correspond with tcp_retries2=15), a new data packet is sent. Note: No data is transmitted between (*3) and (*4) and we disabled keep alives.
The socket's timeout SHOULD be calculated from this point in time, but instead it's calculated from the prior "event" 16 minutes ago.
(*5) Because no ACK packet is received, the packet is retransmitted.
(*6) At the time of the 2nd retransmission, the socket returns ETIMEDOUT.
>From the history I came to know that there was a fix included, which would resolve above issue. Please find below patch.
static bool tcp_try_undo_recovery(struct sock *sk)
* is ACKed. For Reno it is MUST to prevent false
* fast retransmits (RFC2582). SACK TCP is safe. */
tcp_moderate_cwnd(tp);
+ if (!tcp_any_retrans_done(sk))
+ tp->retrans_stamp = 0;
return true;
}
However, after introducing following fix,
[net,1/2] tcp: only undo on partial ACKs in CA_Loss
I am not able to see retrains_stamp reset to Zero.
Inside tcp_process_loss , we are returning from below code path.
if ((flag & FLAG_SND_UNA_ADVANCED) &&
tcp_try_undo_loss(sk, false))
return;
because of which tp->retrans_stamp is never cleared as we failed to invoke tcp_try_undo_recovery.
Is this a known bug in kernel code or is it an expected behavior.
- Thanks in advance,
Nagaraj
Powered by blists - more mailing lists