[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2046438615.4484034.1736328888690@mail.yahoo.com>
Date: Wed, 8 Jan 2025 09:34:48 +0000 (UTC)
From: Mahdi Arghavani <ma.arghavani@...oo.com>
To: Neal Cardwell <ncardwell@...gle.com>, Eric Dumazet <edumazet@...gle.com>
Cc: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Haibo Zhang <haibo.zhang@...go.ac.nz>,
David Eyers <david.eyers@...go.ac.nz>,
Abbas Arghavani <abbas.arghavani@....se>
Subject: Re: [PATCH net] tcp_cubic: Fix for bug in HyStart implementation in
the Linux kernel
Dear Eric and Neal,
Thank you for the email.
>>> Am I right to say you are referring to
commit 8165a96f6b7122f25bf809aecf06f17b0ec37b58
Yes. The issue arises as a side effect of the changes introduced in this commit.
>>> Please provide a packetdrill test, this will be the most efficient way to demonstrate the issue.
Below are two different methods of demonstrating the issue:A) Demonstrating via the source codeThe changes introduced in commit 8165a9 move the caller of the `bictcp_hystart_reset` function inside the `hystart_update` function.This modification adds an additional condition for triggering the caller, requiring that (tcp_snd_cwnd(tp) >= hystart_low_window) must also be satisfied before invoking `bictcp_hystart_reset`.
B) Demonstrating via a testUnfortunately, I was unable to directly print the value of `ca->round_start` (a variable defined in `tcp_cubic.c`) using packetdrill and provide you with the requested script.Instead, I added a few lines of code to log the status of TCP variables upon packet transmission and ACK reception.To reproduce the same output on your Linux system, you need to apply the changes I made to `tcp_cubic.c` and `tcp_output.c` (see the attached files) and recompile the kernel.I used the attached packetdrill script "only" to emulate data transmission for the test.Below are the logs accumulated in kern.log after running the packetdrill script:
In Line01, the start of the first round is marked by the cubictcp_init function. However, the second round is marked by the reception of the 7th ACK when cwnd is 16 (see Line20).This is incorrect because the 2nd round is started upon receiving the first ACK.This means that `ca->round_start` is updated at t=720994842, while it should have been updated 15.5 ms earlier, at t=720979320.In this test, the length of the ACK train in the second round is calculated to be 15.5 ms shorter, which renders one of HyStart's criteria ineffective.
Line01. 2025-01-08T08:16:23.321839+00:00 h1a kernel: New round is started. t=720873683 Sport=36895 cwnd=10 inflight=0 RTT=100300Line02. 2025-01-08T08:16:23.321842+00:00 h1a kernel: Pkt sending. t=720873878 Sport=36895 cwnd=10 inflight=0 RTT=100300 nextSeq=3915183479Line03. 2025-01-08T08:16:23.321845+00:00 h1a kernel: Pkt sending. t=720873896 Sport=36895 cwnd=10 inflight=2 RTT=100300 nextSeq=3915185479Line04. 2025-01-08T08:16:23.321847+00:00 h1a kernel: Pkt sending. t=720873896 Sport=36895 cwnd=10 inflight=4 RTT=100300 nextSeq=3915187479Line05. 2025-01-08T08:16:23.321849+00:00 h1a kernel: Pkt sending. t=720873896 Sport=36895 cwnd=10 inflight=6 RTT=100300 nextSeq=3915189479Line06. 2025-01-08T08:16:23.427777+00:00 h1a kernel: Pkt sending. t=720873896 Sport=36895 cwnd=10 inflight=8 RTT=100300 nextSeq=3915191479Line07. 2025-01-08T08:16:23.427787+00:00 h1a kernel: Ack receiving. t=720979320 Sport=36895 cwnd=10 inflight=9 RTT=100942 acked=1Line08. 2025-01-08T08:16:23.427790+00:00 h1a kernel: Pkt sending. t=720979335 Sport=36895 cwnd=11 inflight=9 RTT=100942 nextSeq=3915193479Line09. 2025-01-08T08:16:23.427792+00:00 h1a kernel: Ack receiving. t=720979421 Sport=36895 cwnd=11 inflight=10 RTT=101517 acked=1Line10. 2025-01-08T08:16:23.432773+00:00 h1a kernel: Pkt sending. t=720979431 Sport=36895 cwnd=12 inflight=10 RTT=101517 nextSeq=3915195479Line11. 2025-01-08T08:16:23.432785+00:00 h1a kernel: Ack receiving. t=720984502 Sport=36895 cwnd=12 inflight=11 RTT=102654 acked=1Line12. 2025-01-08T08:16:23.432788+00:00 h1a kernel: Pkt sending. t=720984514 Sport=36895 cwnd=13 inflight=11 RTT=102654 nextSeq=3915197479Line13. 2025-01-08T08:16:23.432790+00:00 h1a kernel: Ack receiving. t=720984585 Sport=36895 cwnd=13 inflight=12 RTT=103658 acked=1Line14. 2025-01-08T08:16:23.437774+00:00 h1a kernel: Pkt sending. t=720984594 Sport=36895 cwnd=14 inflight=12 RTT=103658 nextSeq=3915199479Line15. 2025-01-08T08:16:23.437783+00:00 h1a kernel: Ack receiving. t=720989668 Sport=36895 cwnd=14 inflight=13 RTT=105172 acked=1Line16. 2025-01-08T08:16:23.437785+00:00 h1a kernel: Pkt sending. t=720989679 Sport=36895 cwnd=15 inflight=13 RTT=105172 nextSeq=3915201479Line17. 2025-01-08T08:16:23.437787+00:00 h1a kernel: Ack receiving. t=720989747 Sport=36895 cwnd=15 inflight=14 RTT=106507 acked=1Line18. 2025-01-08T08:16:23.442773+00:00 h1a kernel: Pkt sending. t=720989757 Sport=36895 cwnd=16 inflight=14 RTT=106507 nextSeq=3915203479Line19. 2025-01-08T08:16:23.442780+00:00 h1a kernel: Ack receiving. t=720994842 Sport=36895 cwnd=16 inflight=15 RTT=108312 acked=1
Line20. 2025-01-08T08:16:23.442782+00:00 h1a kernel: New round is started. t=720994842 Sport=36895 cwnd=16 inflight=15 RTT=108312
Line21. 2025-01-08T08:16:23.442783+00:00 h1a kernel: Pkt sending. t=720994857 Sport=36895 cwnd=17 inflight=15 RTT=108312 nextSeq=3915205479Line22. 2025-01-08T08:16:23.442785+00:00 h1a kernel: Ack receiving. t=720994927 Sport=36895 cwnd=17 inflight=16 RTT=109902 acked=1Line23. 2025-01-08T08:16:23.448788+00:00 h1a kernel: Pkt sending. t=720994936 Sport=36895 cwnd=18 inflight=16 RTT=109902 nextSeq=3915207479Line24. 2025-01-08T08:16:23.448805+00:00 h1a kernel: Ack receiving. t=721000016 Sport=36895 cwnd=18 inflight=17 RTT=111929 acked=1Line25. 2025-01-08T08:16:23.448807+00:00 h1a kernel: Pkt sending. t=721000026 Sport=36895 cwnd=19 inflight=17 RTT=111929 nextSeq=3915209479Line26. 2025-01-08T08:16:23.448808+00:00 h1a kernel: Ack receiving. t=721000100 Sport=36895 cwnd=19 inflight=18 RTT=113713 acked=1Line27. 2025-01-08T08:16:23.496807+00:00 h1a kernel: Pkt sending. t=721000110 Sport=36895 cwnd=20 inflight=18 RTT=113713 nextSeq=3915211479
>>> Note that we are still waiting for an HyStart++ implementation for linux, you may be interested in working on it.
Thank you for the suggestion. I would be happy to work on the HyStart++ implementation for Linux. Could you kindly provide more details on the specific requirements, workflow, and expected outcomes to help me get started?
Best wishes,Mahdi Arghavani
On Monday, January 6, 2025 at 09:24:49 PM GMT+13, Eric Dumazet <edumazet@...gle.com> wrote:
On Mon, Jan 6, 2025 at 5:53 AM Mahdi Arghavani <ma.arghavani@...oo.com> wrote:
>
> Hi,
>
> While refining the source code for our project (SUSS), I discovered a bug in the implementation of HyStart in the Linux kernel, starting from version v5.15.6. The issue, caused by incorrect marking of round starts, results in inaccurate measurement of the length of each ACK train. Since HyStart relies on the length of ACK trains as one of two key criteria to stop exponential cwnd growth during Slow-Start, this inaccuracy renders the criterion ineffective, potentially degrading TCP performance.
>
Hi Mahdi
netdev@ mailing list does not accept HTML messages.
Am I right to say you are referring to
commit 8165a96f6b7122f25bf809aecf06f17b0ec37b58
Author: Eric Dumazet <edumazet@...gle.com>
Date: Tue Nov 23 12:25:35 2021 -0800
tcp_cubic: fix spurious Hystart ACK train detections for
not-cwnd-limited flows
[ Upstream commit 4e1fddc98d2585ddd4792b5e44433dcee7ece001 ]
> Issue Description: The problem arises because the hystart_reset function is not called upon receiving the first ACK (when cwnd=iw=10, see the attached figure). Instead, its invocation is delayed until the condition cwnd >= hystart_low_window is satisfied. In each round, this delay causes:
>
> 1) A postponed marking of the start of a new round.
>
> 2) An incorrect update of ca->end_seq, leading to incorrect marking of the subsequent round.
>
> As a result, the ACK train length is underestimated, which adversely affects HyStart’s first criterion for stopping cwnd exponential growth.
>
> Proposed Solution: Below is a tested patch that addresses the issue by ensuring hystart_reset is triggered appropriately:
>
Please provide a packetdrill test, this will be the most efficient way
to demonstrate the issue.
In general, ACK trains detection is not useful in modern networks,
because of TSO and GRO.
Reference : https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ede656e8465839530c3287c7f54adf75dc2b9563
Note that we are still waiting for an HyStart++ implementation for linux,
you may be interested in working on it.
Thank you.
>
>
> diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
>
> index 5dbed91c6178..78d9cf493ace 100644
>
> --- a/net/ipv4/tcp_cubic.c
>
> +++ b/net/ipv4/tcp_cubic.c
>
> @@ -392,6 +392,9 @@ static void hystart_update(struct sock *sk, u32 delay)
>
> if (after(tp->snd_una, ca->end_seq))
>
> bictcp_hystart_reset(sk);
>
>
>
> + if (tcp_snd_cwnd(tp) < hystart_low_window)
>
> + return;
>
> +
>
> if (hystart_detect & HYSTART_ACK_TRAIN) {
>
> u32 now = bictcp_clock_us(sk);
>
>
>
> @@ -468,8 +471,7 @@ __bpf_kfunc static void cubictcp_acked(struct sock *sk, const struct ack_sample
>
> ca->delay_min = delay;
>
>
>
> /* hystart triggers when cwnd is larger than some threshold */
>
> - if (!ca->found && tcp_in_slow_start(tp) && hystart &&
>
> - tcp_snd_cwnd(tp) >= hystart_low_window)
>
> + if (!ca->found && tcp_in_slow_start(tp) && hystart)
>
> hystart_update(sk, delay);
>
> }
>
> Best wishes,
> Mahdi Arghavani
Content of type "text/html" skipped
View attachment "log.txt" of type "text/plain" (3664 bytes)
Download attachment "tcp_cubic.c" of type "application/octet-stream" (17014 bytes)
Download attachment "tcp_output.c" of type "application/octet-stream" (131118 bytes)
Download attachment "test.pkt" of type "application/octet-stream" (1388 bytes)
Powered by blists - more mailing lists