[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240810223130.379146-1-mrzhang97@gmail.com>
Date: Sat, 10 Aug 2024 17:31:30 -0500
From: Mingrui Zhang <mrzhang97@...il.com>
To: edumazet@...gle.com,
davem@...emloft.net,
netdev@...r.kernel.org
Cc: Mingrui Zhang <mrzhang97@...il.com>,
Lisong Xu <xu@....edu>
Subject: [PATCH net] tcp_cubic fix to achieve at least the same throughput as Reno
This patch fixes some CUBIC bugs so that "CUBIC achieves at least
the same throughput as Reno in small-BDP networks"
[RFC 9438: https://www.rfc-editor.org/rfc/rfc9438.html]
It consists of three bug fixes, all changing function bictcp_update()
of tcp_cubic.c, which controls how fast CUBIC increases its
congestion window size snd_cwnd.
Bug fix 1:
ca->ack_cnt += acked; /* count the number of ACKed packets */
if (ca->last_cwnd == cwnd &&
- (s32)(tcp_jiffies32 - ca->last_time) <= HZ / 32)
+ (s32)(tcp_jiffies32 - ca->last_time) <= min(HZ / 32, usecs_to_jiffies(ca->delay_min)))
return;
/* The CUBIC function can update ca->cnt at most once per jiffy.
The original code bypasses bictcp_update() under certain conditions
to reduce the CPU overhead. Intuitively, when last_cwnd==cwnd,
bictcp_update() is executed 32 times per second. As a result,
it is possible that bictcp_update() is not executed for several
RTTs when RTT is short (specifically < 1/32 second = 31 ms and
last_cwnd==cwnd which may happen in small-BDP networks),
thus leading to low throughput in these RTTs.
The patched code executes bictcp_update() 32 times per second
if RTT > 31 ms or every RTT if RTT < 31 ms, when last_cwnd==cwnd.
Bug fix 2:
if (tcp_friendliness) {
u32 scale = beta_scale;
- delta = (cwnd * scale) >> 3;
+ if (cwnd < ca->bic_origin_point)
+ delta = (cwnd * scale) >> 3;
+ else
+ delta = cwnd;
while (ca->ack_cnt > delta) { /* update tcp cwnd */
ca->ack_cnt -= delta;
ca->tcp_cwnd++;
}
The original code follows RFC 8312 (obsoleted CUBIC RFC).
The patched code follows RFC 9438 (new CUBIC RFC).
"Once _W_est_ has grown to reach the _cwnd_ at the time of most
recently setting _ssthresh_ -- that is, _W_est_ >= _cwnd_prior_ --
the sender SHOULD set α__cubic_ to 1 to ensure that it can achieve
the same congestion window increment rate as Reno, which uses AIMD
(1,0.5)."
Bug fix 3:
- if (ca->tcp_cwnd > cwnd) { /* if bic is slower than tcp */
- delta = ca->tcp_cwnd - cwnd;
+ u32 tcp_cwnd_next_rtt = ca->tcp_cwnd + (ca->ack_cnt + cwnd) / delta;
+
+ if (tcp_cwnd_next_rtt > cwnd) { /* if bic is slower than tcp */
+ delta = tcp_cwnd_next_rtt - cwnd;
max_cnt = cwnd / delta;
if (ca->cnt > max_cnt)
ca->cnt = max_cnt;
The original code estimates RENO snd_cwnd using the estimated
RENO snd_cwnd at the current time (i.e., tcp_cwnd).
The patched code estimates RENO snd_cwnd using the estimated
RENO snd_cwnd after one RTT (i.e., tcp_cwnd_next_rtt),
because ca->cnt is used to increase snd_cwnd for the next RTT.
Experiments:
Below are Mininet experiments to demonstrate the performance difference
between the original CUBIC and patched CUBIC.
Network: link capacity = 100Mbps, RTT = 4ms
TCP flows: one RENO and one CUBIC. initial cwnd = 10 packets.
The first data packet of each flow is lost
snd_cwnd of RENO and original CUBIC flows
https://github.com/zmrui/tcp_cubic_fix/blob/main/reno_and_cubic.jpg
snd_cwnd of RENO and patched CUBIC (with bug fixes 1, 2, and 3) flows.
https://github.com/zmrui/tcp_cubic_fix/blob/main/reno_and_cubic_fixb1b2b3.jpg
The result of patched CUBIC with different combinations of
bug fixes 1, 2, and 3 can be found at the following link,
where you can also find more experiment results.
https://github.com/zmrui/tcp_cubic_fix
Thanks
Mingrui, and Lisong
Signed-off-by: Mingrui Zhang <mrzhang97@...il.com>
Signed-off-by: Lisong Xu <xu@....edu>
---
net/ipv4/tcp_cubic.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
index 5dbed91c6178..2b5723194bfa 100644
--- a/net/ipv4/tcp_cubic.c
+++ b/net/ipv4/tcp_cubic.c
@@ -219,7 +219,7 @@ static inline void bictcp_update(struct bictcp *ca, u32 cwnd, u32 acked)
ca->ack_cnt += acked; /* count the number of ACKed packets */
if (ca->last_cwnd == cwnd &&
- (s32)(tcp_jiffies32 - ca->last_time) <= HZ / 32)
+ (s32)(tcp_jiffies32 - ca->last_time) <= min(HZ / 32, usecs_to_jiffies(ca->delay_min)))
return;
/* The CUBIC function can update ca->cnt at most once per jiffy.
@@ -301,14 +301,19 @@ static inline void bictcp_update(struct bictcp *ca, u32 cwnd, u32 acked)
if (tcp_friendliness) {
u32 scale = beta_scale;
- delta = (cwnd * scale) >> 3;
+ if (cwnd < ca->bic_origin_point)
+ delta = (cwnd * scale) >> 3;
+ else
+ delta = cwnd;
while (ca->ack_cnt > delta) { /* update tcp cwnd */
ca->ack_cnt -= delta;
ca->tcp_cwnd++;
}
- if (ca->tcp_cwnd > cwnd) { /* if bic is slower than tcp */
- delta = ca->tcp_cwnd - cwnd;
+ u32 tcp_cwnd_next_rtt = ca->tcp_cwnd + (ca->ack_cnt + cwnd) / delta;
+
+ if (tcp_cwnd_next_rtt > cwnd) { /* if bic is slower than tcp */
+ delta = tcp_cwnd_next_rtt - cwnd;
max_cnt = cwnd / delta;
if (ca->cnt > max_cnt)
ca->cnt = max_cnt;
--
2.34.1
Powered by blists - more mailing lists