[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CADVnQymzCpJozeF-wMPbppizg0SUAUufgyQEeD7AB5DZDNBTEw@mail.gmail.com>
Date: Thu, 9 Jan 2025 12:23:39 -0500
From: Neal Cardwell <ncardwell@...gle.com>
To: Mahdi Arghavani <ma.arghavani@...oo.com>
Cc: Eric Dumazet <edumazet@...gle.com>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Haibo Zhang <haibo.zhang@...go.ac.nz>, David Eyers <david.eyers@...go.ac.nz>,
Abbas Arghavani <abbas.arghavani@....se>
Subject: Re: [PATCH net] tcp_cubic: Fix for bug in HyStart implementation in
the Linux kernel
On Wed, Jan 8, 2025 at 4:36 AM Mahdi Arghavani <ma.arghavani@...oo.com> wrote:
>
> Dear Eric and Neal,
>
> Thank you for the email.
Please use plain text email so that your emails will be forwarded by
the netdev mail server. :-)
> >>> Am I right to say you are referring to
> commit 8165a96f6b7122f25bf809aecf06f17b0ec37b58
>
> Yes. The issue arises as a side effect of the changes introduced in this commit.
>
> >>> Please provide a packetdrill test, this will be the most efficient way to demonstrate the issue.
>
> Below are two different methods of demonstrating the issue:
> A) Demonstrating via the source code
> The changes introduced in commit 8165a9 move the caller of the `bictcp_hystart_reset` function inside the `hystart_update` function.
> This modification adds an additional condition for triggering the caller, requiring that (tcp_snd_cwnd(tp) >= hystart_low_window) must also be satisfied before invoking `bictcp_hystart_reset`.
Thanks for the nice analysis.
Looks like 8165a96f6b7122f25bf809aecf06f17b0ec37b58 is a stable
branch fix, and the original commit is:
4e1fddc98d2585ddd4792b5e44433dcee7ece001
So the ultimate patch to fix this can use a Fixes tag like:
Fixes: 4e1fddc98d25 ("tcp_cubic: fix spurious Hystart ACK train
detections for not-cwnd-limited flows")
Please also move the "hystart triggers when cwnd is larger than some
threshold" comment to the line above where you have moved the logic:
So the patch reads something like:
@@ -392,6 +392,9 @@ static void hystart_update(struct sock *sk, u32 delay)
if (after(tp->snd_una, ca->end_seq))
bictcp_hystart_reset(sk);
+ /* hystart triggers when cwnd is larger than some threshold */
+ if (tcp_snd_cwnd(tp) < hystart_low_window)
+ return;
+
...
- /* hystart triggers when cwnd is larger than some threshold */
- if (!ca->found && tcp_in_slow_start(tp) && hystart &&
- tcp_snd_cwnd(tp) >= hystart_low_window)
+ if (!ca->found && tcp_in_slow_start(tp) && hystart)
hystart_update(sk, delay);
}
> B) Demonstrating via a test
> Unfortunately, I was unable to directly print the value of `ca->round_start` (a variable defined in `tcp_cubic.c`) using packetdrill and provide you with the requested script.
> Instead, I added a few lines of code to log the status of TCP variables upon packet transmission and ACK reception.
> To reproduce the same output on your Linux system, you need to apply the changes I made to `tcp_cubic.c` and `tcp_output.c` (see the attached files) and recompile the kernel.
> I used the attached packetdrill script "only" to emulate data transmission for the test.
> Below are the logs accumulated in kern.log after running the packetdrill script:
>
> In Line01, the start of the first round is marked by the cubictcp_init function. However, the second round is marked by the reception of the 7th ACK when cwnd is 16 (see Line20).
> This is incorrect because the 2nd round is started upon receiving the first ACK.
> This means that `ca->round_start` is updated at t=720994842, while it should have been updated 15.5 ms earlier, at t=720979320.
> In this test, the length of the ACK train in the second round is calculated to be 15.5 ms shorter, which renders one of HyStart's criteria ineffective.
>
> Line01. 2025-01-08T08:16:23.321839+00:00 h1a kernel: New round is started. t=720873683 Sport=36895 cwnd=10 inflight=0 RTT=100300
> Line02. 2025-01-08T08:16:23.321842+00:00 h1a kernel: Pkt sending. t=720873878 Sport=36895 cwnd=10 inflight=0 RTT=100300 nextSeq=3915183479
> Line03. 2025-01-08T08:16:23.321845+00:00 h1a kernel: Pkt sending. t=720873896 Sport=36895 cwnd=10 inflight=2 RTT=100300 nextSeq=3915185479
> Line04. 2025-01-08T08:16:23.321847+00:00 h1a kernel: Pkt sending. t=720873896 Sport=36895 cwnd=10 inflight=4 RTT=100300 nextSeq=3915187479
> Line05. 2025-01-08T08:16:23.321849+00:00 h1a kernel: Pkt sending. t=720873896 Sport=36895 cwnd=10 inflight=6 RTT=100300 nextSeq=3915189479
> Line06. 2025-01-08T08:16:23.427777+00:00 h1a kernel: Pkt sending. t=720873896 Sport=36895 cwnd=10 inflight=8 RTT=100300 nextSeq=3915191479
> Line07. 2025-01-08T08:16:23.427787+00:00 h1a kernel: Ack receiving. t=720979320 Sport=36895 cwnd=10 inflight=9 RTT=100942 acked=1
> Line08. 2025-01-08T08:16:23.427790+00:00 h1a kernel: Pkt sending. t=720979335 Sport=36895 cwnd=11 inflight=9 RTT=100942 nextSeq=3915193479
> Line09. 2025-01-08T08:16:23.427792+00:00 h1a kernel: Ack receiving. t=720979421 Sport=36895 cwnd=11 inflight=10 RTT=101517 acked=1
> Line10. 2025-01-08T08:16:23.432773+00:00 h1a kernel: Pkt sending. t=720979431 Sport=36895 cwnd=12 inflight=10 RTT=101517 nextSeq=3915195479
> Line11. 2025-01-08T08:16:23.432785+00:00 h1a kernel: Ack receiving. t=720984502 Sport=36895 cwnd=12 inflight=11 RTT=102654 acked=1
> Line12. 2025-01-08T08:16:23.432788+00:00 h1a kernel: Pkt sending. t=720984514 Sport=36895 cwnd=13 inflight=11 RTT=102654 nextSeq=3915197479
> Line13. 2025-01-08T08:16:23.432790+00:00 h1a kernel: Ack receiving. t=720984585 Sport=36895 cwnd=13 inflight=12 RTT=103658 acked=1
> Line14. 2025-01-08T08:16:23.437774+00:00 h1a kernel: Pkt sending. t=720984594 Sport=36895 cwnd=14 inflight=12 RTT=103658 nextSeq=3915199479
> Line15. 2025-01-08T08:16:23.437783+00:00 h1a kernel: Ack receiving. t=720989668 Sport=36895 cwnd=14 inflight=13 RTT=105172 acked=1
> Line16. 2025-01-08T08:16:23.437785+00:00 h1a kernel: Pkt sending. t=720989679 Sport=36895 cwnd=15 inflight=13 RTT=105172 nextSeq=3915201479
> Line17. 2025-01-08T08:16:23.437787+00:00 h1a kernel: Ack receiving. t=720989747 Sport=36895 cwnd=15 inflight=14 RTT=106507 acked=1
> Line18. 2025-01-08T08:16:23.442773+00:00 h1a kernel: Pkt sending. t=720989757 Sport=36895 cwnd=16 inflight=14 RTT=106507 nextSeq=3915203479
> Line19. 2025-01-08T08:16:23.442780+00:00 h1a kernel: Ack receiving. t=720994842 Sport=36895 cwnd=16 inflight=15 RTT=108312 acked=1
>
> Line20. 2025-01-08T08:16:23.442782+00:00 h1a kernel: New round is started. t=720994842 Sport=36895 cwnd=16 inflight=15 RTT=108312
>
> Line21. 2025-01-08T08:16:23.442783+00:00 h1a kernel: Pkt sending. t=720994857 Sport=36895 cwnd=17 inflight=15 RTT=108312 nextSeq=3915205479
> Line22. 2025-01-08T08:16:23.442785+00:00 h1a kernel: Ack receiving. t=720994927 Sport=36895 cwnd=17 inflight=16 RTT=109902 acked=1
> Line23. 2025-01-08T08:16:23.448788+00:00 h1a kernel: Pkt sending. t=720994936 Sport=36895 cwnd=18 inflight=16 RTT=109902 nextSeq=3915207479
> Line24. 2025-01-08T08:16:23.448805+00:00 h1a kernel: Ack receiving. t=721000016 Sport=36895 cwnd=18 inflight=17 RTT=111929 acked=1
> Line25. 2025-01-08T08:16:23.448807+00:00 h1a kernel: Pkt sending. t=721000026 Sport=36895 cwnd=19 inflight=17 RTT=111929 nextSeq=3915209479
> Line26. 2025-01-08T08:16:23.448808+00:00 h1a kernel: Ack receiving. t=721000100 Sport=36895 cwnd=19 inflight=18 RTT=113713 acked=1
> Line27. 2025-01-08T08:16:23.496807+00:00 h1a kernel: Pkt sending. t=721000110 Sport=36895 cwnd=20 inflight=18 RTT=113713 nextSeq=3915211479
To create a packetdrill test, you don't need to print round_start. You
can simply construct a packetdrill scenario and assert that
tcpi_snd_cwnd and tcpi_snd_ssthresh change in the expected ways over
the course of the test, as packetdrill injects ACKs into your kernel
under test.
I have upstreamed our packetdrill tests for TCP CUBIC today, so you
can have some examples to look at. I recommend looking at the
gtests/net/tcp/cubic/cubic-hystart-delay-rtt-jumps-upward.pkt file in
this patch:
https://github.com/google/packetdrill/commit/8d63bbc7d6273f86e826ac16dbc3c38d4a41b129#diff-d7a68a37bc59309d374f8b97abcd406e263980415dd5af5c68db23f90f2d21a6
Before sending your patch to the list, please:
+ Download and build packetdrill. For tips on using packetdrill, you
can start with:
https://github.com/google/packetdrill
+ run all cubic packetdrill tests, to help test that your commit does
not introduce any bugs:
cd ~/packetdrill/gtests/net/
./packetdrill/run_all.py -S -v -L -l tcp/cubic/
+ read: https://www.kernel.org/doc/html/v6.11/process/maintainer-netdev.html
+ run something like the following to verify the format of the patch
git format-patch --subject-prefix='PATCH net' HEAD~1..HEAD
./scripts/checkpatch.pl 00*patch
When all the warnings have been resolved, you can send the patch to
the list. :-)
> >>> Note that we are still waiting for an HyStart++ implementation for linux, you may be interested in working on it.
>
> Thank you for the suggestion. I would be happy to work on the HyStart++ implementation for Linux. Could you kindly provide more details on the specific requirements, workflow, and expected outcomes to help me get started?
The specific requirements are in the Hystart++ RFC:
https://datatracker.ietf.org/doc/html/rfc9406
The workflow would be to develop the code, run your kernel to test it
with packetdrill and test transfers in a controlled setting, then send
the patches to the netdev list for review.
The expected outcome would be to come up with working patches that are
readable, pass ./scripts/checkpatch.pl, compile and pass packetdrill
cubic tests, and produce improved behavior in at least some test
(probably a test where the Hystart++ implementation prevents a
spurious exit of slow-start when min_rtt jumps upward, which is common
in cellular/wifi cases).
thanks,
neal
> Best wishes,
> Mahdi Arghavani
>
> On Monday, January 6, 2025 at 09:24:49 PM GMT+13, Eric Dumazet <edumazet@...gle.com> wrote:
>
>
> On Mon, Jan 6, 2025 at 5:53 AM Mahdi Arghavani <ma.arghavani@...oo.com> wrote:
> >
> > Hi,
> >
> > While refining the source code for our project (SUSS), I discovered a bug in the implementation of HyStart in the Linux kernel, starting from version v5.15.6. The issue, caused by incorrect marking of round starts, results in inaccurate measurement of the length of each ACK train. Since HyStart relies on the length of ACK trains as one of two key criteria to stop exponential cwnd growth during Slow-Start, this inaccuracy renders the criterion ineffective, potentially degrading TCP performance.
> >
>
> Hi Mahdi
>
> netdev@ mailing list does not accept HTML messages.
>
> Am I right to say you are referring to
>
> commit 8165a96f6b7122f25bf809aecf06f17b0ec37b58
> Author: Eric Dumazet <edumazet@...gle.com>
> Date: Tue Nov 23 12:25:35 2021 -0800
>
> tcp_cubic: fix spurious Hystart ACK train detections for
> not-cwnd-limited flows
>
> [ Upstream commit 4e1fddc98d2585ddd4792b5e44433dcee7ece001 ]
>
>
>
> > Issue Description: The problem arises because the hystart_reset function is not called upon receiving the first ACK (when cwnd=iw=10, see the attached figure). Instead, its invocation is delayed until the condition cwnd >= hystart_low_window is satisfied. In each round, this delay causes:
> >
> > 1) A postponed marking of the start of a new round.
> >
> > 2) An incorrect update of ca->end_seq, leading to incorrect marking of the subsequent round.
> >
> > As a result, the ACK train length is underestimated, which adversely affects HyStart’s first criterion for stopping cwnd exponential growth.
> >
> > Proposed Solution: Below is a tested patch that addresses the issue by ensuring hystart_reset is triggered appropriately:
> >
>
>
>
> Please provide a packetdrill test, this will be the most efficient way
> to demonstrate the issue.
>
> In general, ACK trains detection is not useful in modern networks,
> because of TSO and GRO.
>
> Reference : https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ede656e8465839530c3287c7f54adf75dc2b9563
>
> Note that we are still waiting for an HyStart++ implementation for linux,
> you may be interested in working on it.
>
> Thank you.
>
>
> >
> >
> > diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
> >
> > index 5dbed91c6178..78d9cf493ace 100644
> >
> > --- a/net/ipv4/tcp_cubic.c
> >
> > +++ b/net/ipv4/tcp_cubic.c
> >
> > @@ -392,6 +392,9 @@ static void hystart_update(struct sock *sk, u32 delay)
> >
> > if (after(tp->snd_una, ca->end_seq))
> >
> > bictcp_hystart_reset(sk);
> >
> >
> >
> > + if (tcp_snd_cwnd(tp) < hystart_low_window)
> >
> > + return;
> >
> > +
> >
> > if (hystart_detect & HYSTART_ACK_TRAIN) {
> >
> > u32 now = bictcp_clock_us(sk);
> >
> >
> >
> > @@ -468,8 +471,7 @@ __bpf_kfunc static void cubictcp_acked(struct sock *sk, const struct ack_sample
> >
> > ca->delay_min = delay;
> >
> >
> >
> > /* hystart triggers when cwnd is larger than some threshold */
> >
> > - if (!ca->found && tcp_in_slow_start(tp) && hystart &&
> >
> > - tcp_snd_cwnd(tp) >= hystart_low_window)
> >
> > + if (!ca->found && tcp_in_slow_start(tp) && hystart)
> >
> > hystart_update(sk, delay);
> >
> > }
> >
> > Best wishes,
> > Mahdi Arghavani
Powered by blists - more mailing lists