netdev - retrans_stamp not cleared while testing NewReno implementation.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <SJ0PR84MB1847BE6C24D274C46A1B9B0EB27A9@SJ0PR84MB1847.NAMPRD84.PROD.OUTLOOK.COM>
Date:   Fri, 2 Sep 2022 10:29:09 +0000
From:   "Arankal, Nagaraj" <nagaraj.p.arankal@....com>
To:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: retrans_stamp not cleared while testing NewReno implementation.

While testing newReno implementation on 4.19.197 based debian kernel, NewReno(SACK disabled) with connections that have a very low traffic, we may timeout the connection too early if a second loss occurs after the first one was successfully acked but no data was transferred later. Below is his description of it:

When SACK is disabled, and a socket suffers multiple separate TCP retransmissions, that socket's ETIMEDOUT value is calculated from the time of the *first* retransmission instead of the *latest* retransmission.

This happens because the tcp_sock's retrans_stamp is set once then never cleared.

Take the following connection:

(*1) One data packet sent.
(*2) Because no ACK packet is received, the packet is retransmitted.
(*3) The ACK packet is received. The transmitted packet is acknowledged.

At this point the first "retransmission event" has passed and been recovered from. Any future retransmission is a completely new "event".

(*4) After 16 minutes (to correspond with tcp_retries2=15), a new data packet is sent. Note: No data is transmitted between (*3) and (*4) and we disabled keep alives.

The socket's timeout SHOULD be calculated from this point in time, but instead it's calculated from the prior "event" 16 minutes ago.

(*5) Because no ACK packet is received, the packet is retransmitted.
(*6) At the time of the 2nd retransmission, the socket returns ETIMEDOUT.

>From the history I came to know that there was a fix included, which would resolve above issue. Please find below patch.

static bool tcp_try_undo_recovery(struct sock *sk)
                                * is ACKed. For Reno it is MUST to prevent false
                                * fast retransmits (RFC2582). SACK TCP is safe. */
                               tcp_moderate_cwnd(tp);
+                             if (!tcp_any_retrans_done(sk))
+                                             tp->retrans_stamp = 0;
                               return true;
               }

However, after introducing following fix, 

[net,1/2] tcp: only undo on partial ACKs in CA_Loss

I am not able to see retrains_stamp reset to Zero.
Inside tcp_process_loss , we are returning from below code path.

if ((flag & FLAG_SND_UNA_ADVANCED) &&
            tcp_try_undo_loss(sk, false))
                return;
because of which tp->retrans_stamp is never cleared as we failed to invoke tcp_try_undo_recovery.

Is this a known bug in kernel code or is it an expected behavior.

- Thanks in advance,
Nagaraj