lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <979088118.32930.1736903937403@mail.yahoo.com>
Date: Wed, 15 Jan 2025 01:18:57 +0000 (UTC)
From: Mahdi Arghavani <ma.arghavani@...oo.com>
To: Neal Cardwell <ncardwell@...gle.com>
Cc: Eric Dumazet <edumazet@...gle.com>, 
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>, 
	Haibo Zhang <haibo.zhang@...go.ac.nz>, 
	David Eyers <david.eyers@...go.ac.nz>, 
	Abbas Arghavani <abbas.arghavani@....se>
Subject: Re: [PATCH net] tcp_cubic: Fix for bug in HyStart implementation in
 the Linux kernel

Hi Neal,
I appreciate your guidance in submitting the bug fix.
 Attached are my updated packetdrill tests.

Best wishes,
Mahdi






On Monday, January 13, 2025 at 04:59:15 AM GMT+13, Neal Cardwell <ncardwell@...gle.com> wrote: 





On Sun, Jan 12, 2025 at 12:47 AM Mahdi Arghavani <ma.arghavani@...oo.com> wrote:
>
>  Hi,
> Thank you for your email.
> Following your suggestion, I downloaded the latest packetdrill tests for CUBIC. Attached is a new script to test the HYSTART-ACK-TRAIN detection mechanism.

Great. Thanks for attaching your test.

> Additionally, it’s a good idea to set the hystart_detect parameter to 2 (instead of 3) in the "cubic-hystart-delay-*.pkt" files.

I would argue that:

(1) Ideally, Hystart-delay tests should be run with both of the
following configurations: (a)
/sys/module/tcp_cubic/parameters/hystart_detect set to only enable
Hystart-delay; (b) /sys/module/tcp_cubic/parameters/hystart_detect set
to enable both Hystart-delay and Hystart-ack-train. (Since those are
both supported configurations of the kernel, and we want to make sure
they function correctly (the two flavors of Hystart don't interfere
with each other, and don't crash or have races or violate memory
safety invariants, etc). Running both (a) and (b) is how we run them
internally, but there's no support yet in the public packetdrill test
harness to run tests in two different configurations like that.

(2) If Hystart-delay tests are only run in one configuration, then
IMHO they should be run with
/sys/module/tcp_cubic/parameters/hystart_detect set to enable both
Hystart-delay and Hystart-ack-train, since that is the default
configuration of the kernel, and the one that the vast majority of
users will thus use.

> I recompiled the Linux kernel with the patch we both agreed on in the previous emails. However, I found that the fix passes all tests except for the following:
> 1) tcp/cubic/cubic-bulk-166k-idle-restart.pkt
> 2) tcp/cubic/cubic-bulk-166k.pkt
>
> This is because these two tests assume that slow-start ends when cwnd = 48, which is not correct. I will work on these two tests and get back to you soon.

Sounds great.

Do you mind sharing your patch as well (either as an attachment or
directly to the list via "git send-email", whichever you prefer at
this stage)? That way we can start offering feedback on the kernel
patch itself.

thanks,
neal




> Best Wishes,
> Mahdi Arghavani
>
>
>
> On Friday, January 10, 2025 at 06:23:58 AM GMT+13, Neal Cardwell <ncardwell@...gle.com> wrote:
>
>
> On Wed, Jan 8, 2025 at 4:36 AM Mahdi Arghavani <ma.arghavani@...oo.com> wrote:
> >
> > Dear Eric and Neal,
> >
> > Thank you for the email.
>
> Please use plain text email so that your emails will be forwarded by
> the netdev mail server. :-)
>
> > >>> Am I right to say you are referring to
> > commit 8165a96f6b7122f25bf809aecf06f17b0ec37b58
> >
> > Yes. The issue arises as a side effect of the changes introduced in this commit.
> >
> > >>> Please provide a packetdrill test, this will be the most efficient way to demonstrate the issue.
> >
> > Below are two different methods of demonstrating the issue:
> > A) Demonstrating via the source code
> > The changes introduced in commit 8165a9 move the caller of the `bictcp_hystart_reset` function inside the `hystart_update` function.
> > This modification adds an additional condition for triggering the caller, requiring that (tcp_snd_cwnd(tp) >= hystart_low_window) must also be satisfied before invoking `bictcp_hystart_reset`.
>
> Thanks for the nice analysis.
>
> Looks like 8165a96f6b7122f25bf809aecf06f17b0ec37b58  is a stable
> branch fix, and the original commit is:
> 4e1fddc98d2585ddd4792b5e44433dcee7ece001
>
> So the ultimate patch to fix this can use a Fixes tag like:
>
> Fixes: 4e1fddc98d25 ("tcp_cubic: fix spurious Hystart ACK train
> detections for not-cwnd-limited flows")
>
> Please also move the "hystart triggers when cwnd is larger than some
> threshold" comment to the line above where you have moved the logic:
>
> So the patch reads something like:
>
> @@ -392,6 +392,9 @@ static void hystart_update(struct sock *sk, u32 delay)
>        if (after(tp->snd_una, ca->end_seq))
>                bictcp_hystart_reset(sk);
>
> +      /* hystart triggers when cwnd is larger than some threshold */
> +      if (tcp_snd_cwnd(tp) < hystart_low_window)
> +              return;
> +
> ...
> -      /* hystart triggers when cwnd is larger than some threshold */
> -      if (!ca->found && tcp_in_slow_start(tp) && hystart &&
> -          tcp_snd_cwnd(tp) >= hystart_low_window)
> +      if (!ca->found && tcp_in_slow_start(tp) && hystart)
>                hystart_update(sk, delay);
> }
>
> > B) Demonstrating via a test
> > Unfortunately, I was unable to directly print the value of `ca->round_start` (a variable defined in `tcp_cubic.c`) using packetdrill and provide you with the requested script.
> > Instead, I added a few lines of code to log the status of TCP variables upon packet transmission and ACK reception.
> > To reproduce the same output on your Linux system, you need to apply the changes I made to `tcp_cubic.c` and `tcp_output.c` (see the attached files) and recompile the kernel.
> > I used the attached packetdrill script "only" to emulate data transmission for the test.
> > Below are the logs accumulated in kern.log after running the packetdrill script:
> >
> > In Line01, the start of the first round is marked by the cubictcp_init function. However, the second round is marked by the reception of the 7th ACK when cwnd is 16 (see Line20).
> > This is incorrect because the 2nd round is started upon receiving the first ACK.
> > This means that `ca->round_start` is updated at t=720994842, while it should have been updated 15.5 ms earlier, at t=720979320.
> > In this test, the length of the ACK train in the second round is calculated to be 15.5 ms shorter, which renders one of HyStart's criteria ineffective.
> >
> > Line01. 2025-01-08T08:16:23.321839+00:00 h1a kernel: New round is started. t=720873683 Sport=36895 cwnd=10 inflight=0 RTT=100300
> > Line02. 2025-01-08T08:16:23.321842+00:00 h1a kernel: Pkt sending. t=720873878 Sport=36895 cwnd=10 inflight=0 RTT=100300 nextSeq=3915183479
> > Line03. 2025-01-08T08:16:23.321845+00:00 h1a kernel: Pkt sending. t=720873896 Sport=36895 cwnd=10 inflight=2 RTT=100300 nextSeq=3915185479
> > Line04. 2025-01-08T08:16:23.321847+00:00 h1a kernel: Pkt sending. t=720873896 Sport=36895 cwnd=10 inflight=4 RTT=100300 nextSeq=3915187479
> > Line05. 2025-01-08T08:16:23.321849+00:00 h1a kernel: Pkt sending. t=720873896 Sport=36895 cwnd=10 inflight=6 RTT=100300 nextSeq=3915189479
> > Line06. 2025-01-08T08:16:23.427777+00:00 h1a kernel: Pkt sending. t=720873896 Sport=36895 cwnd=10 inflight=8 RTT=100300 nextSeq=3915191479
> > Line07. 2025-01-08T08:16:23.427787+00:00 h1a kernel: Ack receiving. t=720979320 Sport=36895 cwnd=10 inflight=9 RTT=100942 acked=1
> > Line08. 2025-01-08T08:16:23.427790+00:00 h1a kernel: Pkt sending. t=720979335 Sport=36895 cwnd=11 inflight=9 RTT=100942 nextSeq=3915193479
> > Line09. 2025-01-08T08:16:23.427792+00:00 h1a kernel: Ack receiving. t=720979421 Sport=36895 cwnd=11 inflight=10 RTT=101517 acked=1
> > Line10. 2025-01-08T08:16:23.432773+00:00 h1a kernel: Pkt sending. t=720979431 Sport=36895 cwnd=12 inflight=10 RTT=101517 nextSeq=3915195479
> > Line11. 2025-01-08T08:16:23.432785+00:00 h1a kernel: Ack receiving. t=720984502 Sport=36895 cwnd=12 inflight=11 RTT=102654 acked=1
> > Line12. 2025-01-08T08:16:23.432788+00:00 h1a kernel: Pkt sending. t=720984514 Sport=36895 cwnd=13 inflight=11 RTT=102654 nextSeq=3915197479
> > Line13. 2025-01-08T08:16:23.432790+00:00 h1a kernel: Ack receiving. t=720984585 Sport=36895 cwnd=13 inflight=12 RTT=103658 acked=1
> > Line14. 2025-01-08T08:16:23.437774+00:00 h1a kernel: Pkt sending. t=720984594 Sport=36895 cwnd=14 inflight=12 RTT=103658 nextSeq=3915199479
> > Line15. 2025-01-08T08:16:23.437783+00:00 h1a kernel: Ack receiving. t=720989668 Sport=36895 cwnd=14 inflight=13 RTT=105172 acked=1
> > Line16. 2025-01-08T08:16:23.437785+00:00 h1a kernel: Pkt sending. t=720989679 Sport=36895 cwnd=15 inflight=13 RTT=105172 nextSeq=3915201479
> > Line17. 2025-01-08T08:16:23.437787+00:00 h1a kernel: Ack receiving. t=720989747 Sport=36895 cwnd=15 inflight=14 RTT=106507 acked=1
> > Line18. 2025-01-08T08:16:23.442773+00:00 h1a kernel: Pkt sending. t=720989757 Sport=36895 cwnd=16 inflight=14 RTT=106507 nextSeq=3915203479
> > Line19. 2025-01-08T08:16:23.442780+00:00 h1a kernel: Ack receiving. t=720994842 Sport=36895 cwnd=16 inflight=15 RTT=108312 acked=1
> >
> > Line20. 2025-01-08T08:16:23.442782+00:00 h1a kernel: New round is started. t=720994842 Sport=36895 cwnd=16 inflight=15 RTT=108312
> >
> > Line21. 2025-01-08T08:16:23.442783+00:00 h1a kernel: Pkt sending. t=720994857 Sport=36895 cwnd=17 inflight=15 RTT=108312 nextSeq=3915205479
> > Line22. 2025-01-08T08:16:23.442785+00:00 h1a kernel: Ack receiving. t=720994927 Sport=36895 cwnd=17 inflight=16 RTT=109902 acked=1
> > Line23. 2025-01-08T08:16:23.448788+00:00 h1a kernel: Pkt sending. t=720994936 Sport=36895 cwnd=18 inflight=16 RTT=109902 nextSeq=3915207479
> > Line24. 2025-01-08T08:16:23.448805+00:00 h1a kernel: Ack receiving. t=721000016 Sport=36895 cwnd=18 inflight=17 RTT=111929 acked=1
> > Line25. 2025-01-08T08:16:23.448807+00:00 h1a kernel: Pkt sending. t=721000026 Sport=36895 cwnd=19 inflight=17 RTT=111929 nextSeq=3915209479
> > Line26. 2025-01-08T08:16:23.448808+00:00 h1a kernel: Ack receiving. t=721000100 Sport=36895 cwnd=19 inflight=18 RTT=113713 acked=1
> > Line27. 2025-01-08T08:16:23.496807+00:00 h1a kernel: Pkt sending. t=721000110 Sport=36895 cwnd=20 inflight=18 RTT=113713 nextSeq=3915211479
>
> To create a packetdrill test, you don't need to print round_start. You
> can simply construct a packetdrill scenario and assert that
> tcpi_snd_cwnd and tcpi_snd_ssthresh change in the expected ways over
> the course of the test, as packetdrill injects ACKs into your kernel
> under test.
>
> I have upstreamed our packetdrill tests for TCP CUBIC today, so you
> can have some examples to look at. I recommend looking at the
> gtests/net/tcp/cubic/cubic-hystart-delay-rtt-jumps-upward.pkt file in
> this patch:
>
> https://github.com/google/packetdrill/commit/8d63bbc7d6273f86e826ac16dbc3c38d4a41b129#diff-d7a68a37bc59309d374f8b97abcd406e263980415dd5af5c68db23f90f2d21a6
>
> Before sending your patch to the list, please:
>
> + Download and build packetdrill. For tips on using packetdrill, you
> can start with:
>
> https://github.com/google/packetdrill
>
> + run all cubic packetdrill tests, to help test that your commit does
> not introduce any bugs:
>
> cd ~/packetdrill/gtests/net/
> ./packetdrill/run_all.py -S -v -L -l tcp/cubic/
>
> + read: https://www.kernel.org/doc/html/v6.11/process/maintainer-netdev.html
>
> + run something like the following to verify the format of the patch
>
> git format-patch --subject-prefix='PATCH net' HEAD~1..HEAD
> ./scripts/checkpatch.pl 00*patch
>
> When all the warnings have been resolved, you can send the patch to
> the list. :-)
>
> > >>> Note that we are still waiting for an HyStart++ implementation for linux, you may be interested in working on it.
> >
> > Thank you for the suggestion. I would be happy to work on the HyStart++ implementation for Linux. Could you kindly provide more details on the specific requirements, workflow, and expected outcomes to help me get started?
>
> The specific requirements are in the Hystart++ RFC:
>  https://datatracker.ietf.org/doc/html/rfc9406
>
> The workflow would be to develop the code, run your kernel to test it
> with packetdrill and test transfers in a controlled setting, then send
> the patches to the netdev list for review.
>
> The expected outcome would be to come up with working patches that are
> readable, pass ./scripts/checkpatch.pl, compile and pass packetdrill
> cubic tests, and produce improved behavior in at least some test
> (probably a test where the Hystart++ implementation prevents a
> spurious exit of slow-start when min_rtt jumps upward, which is common
> in cellular/wifi cases).
>
> thanks,
> neal
>
>
> > Best wishes,
> > Mahdi Arghavani
> >
> > On Monday, January 6, 2025 at 09:24:49 PM GMT+13, Eric Dumazet <edumazet@...gle.com> wrote:
> >
> >
> > On Mon, Jan 6, 2025 at 5:53 AM Mahdi Arghavani <ma.arghavani@...oo.com> wrote:
> > >
> > > Hi,
> > >
> > > While refining the source code for our project (SUSS), I discovered a bug in the implementation of HyStart in the Linux kernel, starting from version v5.15.6. The issue, caused by incorrect marking of round starts, results in inaccurate measurement of the length of each ACK train. Since HyStart relies on the length of ACK trains as one of two key criteria to stop exponential cwnd growth during Slow-Start, this inaccuracy renders the criterion ineffective, potentially degrading TCP performance.
> > >
> >
> > Hi Mahdi
> >
> > netdev@ mailing list does not accept HTML messages.
> >
> > Am I right to say you are referring to
> >
> > commit 8165a96f6b7122f25bf809aecf06f17b0ec37b58
> > Author: Eric Dumazet <edumazet@...gle.com>
> > Date:  Tue Nov 23 12:25:35 2021 -0800
> >
> >    tcp_cubic: fix spurious Hystart ACK train detections for
> > not-cwnd-limited flows
> >
> >    [ Upstream commit 4e1fddc98d2585ddd4792b5e44433dcee7ece001 ]
> >
> >
> >
> > > Issue Description: The problem arises because the hystart_reset function is not called upon receiving the first ACK (when cwnd=iw=10, see the attached figure). Instead, its invocation is delayed until the condition cwnd >= hystart_low_window is satisfied. In each round, this delay causes:
> > >
> > > 1) A postponed marking of the start of a new round.
> > >
> > > 2) An incorrect update of ca->end_seq, leading to incorrect marking of the subsequent round.
> > >
> > > As a result, the ACK train length is underestimated, which adversely affects HyStart’s first criterion for stopping cwnd exponential growth.
> > >
> > > Proposed Solution: Below is a tested patch that addresses the issue by ensuring hystart_reset is triggered appropriately:
> > >
> >
> >
> >
> > Please provide a packetdrill test, this will be the most efficient way
> > to demonstrate the issue.
> >
> > In general, ACK trains detection is not useful in modern networks,
> > because of TSO and GRO.
> >
> > Reference : https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ede656e8465839530c3287c7f54adf75dc2b9563
> >
> > Note that we are still waiting for an HyStart++ implementation for linux,
> > you may be interested in working on it.
> >
> > Thank you.
> >
> >
> > >
> > >
> > > diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
> > >
> > > index 5dbed91c6178..78d9cf493ace 100644
> > >
> > > --- a/net/ipv4/tcp_cubic.c
> > >
> > > +++ b/net/ipv4/tcp_cubic.c
> > >
> > > @@ -392,6 +392,9 @@ static void hystart_update(struct sock *sk, u32 delay)
> > >
> > >        if (after(tp->snd_una, ca->end_seq))
> > >
> > >                bictcp_hystart_reset(sk);
> > >
> > >
> > >
> > > +      if (tcp_snd_cwnd(tp) < hystart_low_window)
> > >
> > > +              return;
> > >
> > > +
> > >
> > >        if (hystart_detect & HYSTART_ACK_TRAIN) {
> > >
> > >                u32 now = bictcp_clock_us(sk);
> > >
> > >
> > >
> > > @@ -468,8 +471,7 @@ __bpf_kfunc static void cubictcp_acked(struct sock *sk, const struct ack_sample
> > >
> > >                ca->delay_min = delay;
> > >
> > >
> > >
> > >        /* hystart triggers when cwnd is larger than some threshold */
> > >
> > > -      if (!ca->found && tcp_in_slow_start(tp) && hystart &&
> > >
> > > -          tcp_snd_cwnd(tp) >= hystart_low_window)
> > >
> > > +      if (!ca->found && tcp_in_slow_start(tp) && hystart)
> > >
> > >                hystart_update(sk, delay);
> > >
> > >  }
> > >
> > > Best wishes,
> > > Mahdi Arghavani
Download attachment "cubic-hystart-length-of-ack-train.pkt" of type "application/octet-stream" (3347 bytes)

Download attachment "new-cubic-bulk-166k-idle-restart.pkt" of type "application/octet-stream" (8166 bytes)

Download attachment "new-cubic-bulk-166k.pkt" of type "application/octet-stream" (12538 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ