lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240810223130.379146-1-mrzhang97@gmail.com>
Date: Sat, 10 Aug 2024 17:31:30 -0500
From: Mingrui Zhang <mrzhang97@...il.com>
To: edumazet@...gle.com,
	davem@...emloft.net,
	netdev@...r.kernel.org
Cc: Mingrui Zhang <mrzhang97@...il.com>,
	Lisong Xu <xu@....edu>
Subject: [PATCH net] tcp_cubic fix to achieve at least the same throughput as Reno

This patch fixes some CUBIC bugs so that  "CUBIC achieves at least
the same throughput as Reno in small-BDP networks"
[RFC 9438: https://www.rfc-editor.org/rfc/rfc9438.html]

It consists of three bug fixes, all changing function bictcp_update()
of tcp_cubic.c, which controls how fast CUBIC increases its
congestion window size snd_cwnd.

Bug fix 1:
 	ca->ack_cnt += acked;	/* count the number of ACKed packets */
 
 	if (ca->last_cwnd == cwnd &&
-	    (s32)(tcp_jiffies32 - ca->last_time) <= HZ / 32)
+	    (s32)(tcp_jiffies32 - ca->last_time) <= min(HZ / 32, usecs_to_jiffies(ca->delay_min)))
 		return;
 
 	/* The CUBIC function can update ca->cnt at most once per jiffy.

The original code bypasses bictcp_update() under certain conditions
to reduce the CPU overhead. Intuitively, when last_cwnd==cwnd,
bictcp_update() is executed 32 times per second. As a result, 
it is possible that bictcp_update() is not executed for several 
RTTs when RTT is short (specifically < 1/32 second = 31 ms and 
last_cwnd==cwnd which may happen in small-BDP networks), 
thus leading to low throughput in these RTTs.

The patched code executes bictcp_update() 32 times per second
if RTT > 31 ms or every RTT if RTT < 31 ms, when last_cwnd==cwnd.

Bug fix 2:
 	if (tcp_friendliness) {
 		u32 scale = beta_scale;
 
-		delta = (cwnd * scale) >> 3;
+		if (cwnd < ca->bic_origin_point)
+			delta = (cwnd * scale) >> 3;
+		else
+			delta = cwnd;
 		while (ca->ack_cnt > delta) {		/* update tcp cwnd */
 			ca->ack_cnt -= delta;
 			ca->tcp_cwnd++;
 		}

The original code follows RFC 8312 (obsoleted CUBIC RFC).

The patched code follows RFC 9438 (new CUBIC RFC).

"Once _W_est_ has grown to reach the _cwnd_ at the time of most
recently setting _ssthresh_ -- that is, _W_est_ >= _cwnd_prior_ --
the sender SHOULD set α__cubic_ to 1 to ensure that it can achieve
the same congestion window increment rate as Reno, which uses AIMD
(1,0.5)."

Bug fix 3:
-		if (ca->tcp_cwnd > cwnd) {	/* if bic is slower than tcp */
-			delta = ca->tcp_cwnd - cwnd;
+		u32 tcp_cwnd_next_rtt = ca->tcp_cwnd + (ca->ack_cnt + cwnd) / delta;
+
+		if (tcp_cwnd_next_rtt > cwnd) {  /* if bic is slower than tcp */
+			delta = tcp_cwnd_next_rtt - cwnd;
 			max_cnt = cwnd / delta;
 			if (ca->cnt > max_cnt)
 				ca->cnt = max_cnt;
 
The original code estimates RENO snd_cwnd using the estimated 
RENO snd_cwnd at the current time (i.e., tcp_cwnd).

The patched code estimates RENO snd_cwnd using the estimated 
RENO snd_cwnd after one RTT (i.e., tcp_cwnd_next_rtt), 
because ca->cnt is used to increase snd_cwnd for the next RTT.

Experiments:

Below are Mininet experiments to demonstrate the performance difference
between the original CUBIC and patched CUBIC.

Network: link capacity = 100Mbps, RTT = 4ms

TCP flows: one RENO and one CUBIC. initial cwnd = 10 packets.
The first data packet of each flow is lost

snd_cwnd of RENO and original CUBIC flows

https://github.com/zmrui/tcp_cubic_fix/blob/main/reno_and_cubic.jpg

snd_cwnd of RENO and patched CUBIC (with bug fixes 1, 2, and 3) flows.

https://github.com/zmrui/tcp_cubic_fix/blob/main/reno_and_cubic_fixb1b2b3.jpg

The result of patched CUBIC with different combinations of
bug fixes 1, 2, and 3 can be found at the following link,
where you can also find more experiment results.

https://github.com/zmrui/tcp_cubic_fix


Thanks
Mingrui, and Lisong

Signed-off-by: Mingrui Zhang <mrzhang97@...il.com>
Signed-off-by: Lisong Xu <xu@....edu>
---
 net/ipv4/tcp_cubic.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
index 5dbed91c6178..2b5723194bfa 100644
--- a/net/ipv4/tcp_cubic.c
+++ b/net/ipv4/tcp_cubic.c
@@ -219,7 +219,7 @@ static inline void bictcp_update(struct bictcp *ca, u32 cwnd, u32 acked)
 	ca->ack_cnt += acked;	/* count the number of ACKed packets */
 
 	if (ca->last_cwnd == cwnd &&
-	    (s32)(tcp_jiffies32 - ca->last_time) <= HZ / 32)
+	    (s32)(tcp_jiffies32 - ca->last_time) <= min(HZ / 32, usecs_to_jiffies(ca->delay_min)))
 		return;
 
 	/* The CUBIC function can update ca->cnt at most once per jiffy.
@@ -301,14 +301,19 @@ static inline void bictcp_update(struct bictcp *ca, u32 cwnd, u32 acked)
 	if (tcp_friendliness) {
 		u32 scale = beta_scale;
 
-		delta = (cwnd * scale) >> 3;
+		if (cwnd < ca->bic_origin_point)
+			delta = (cwnd * scale) >> 3;
+		else
+			delta = cwnd;
 		while (ca->ack_cnt > delta) {		/* update tcp cwnd */
 			ca->ack_cnt -= delta;
 			ca->tcp_cwnd++;
 		}
 
-		if (ca->tcp_cwnd > cwnd) {	/* if bic is slower than tcp */
-			delta = ca->tcp_cwnd - cwnd;
+		u32 tcp_cwnd_next_rtt = ca->tcp_cwnd + (ca->ack_cnt + cwnd) / delta;
+
+		if (tcp_cwnd_next_rtt > cwnd) {  /* if bic is slower than tcp */
+			delta = tcp_cwnd_next_rtt - cwnd;
 			max_cnt = cwnd / delta;
 			if (ca->cnt > max_cnt)
 				ca->cnt = max_cnt;
-- 
2.34.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ