netdev - Re: [PATCH net] tcp_cubic fix to achieve at least the same throughput as Reno

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CADVnQykdo-EyGeZxPjLEmAFcT9HyNU5ijvV53urHRcobhOLJHw@mail.gmail.com>
Date: Tue, 13 Aug 2024 21:59:31 -0400
From: Neal Cardwell <ncardwell@...gle.com>
To: Mingrui Zhang <mrzhang97@...il.com>
Cc: edumazet@...gle.com, davem@...emloft.net, netdev@...r.kernel.org, 
	Lisong Xu <xu@....edu>
Subject: Re: [PATCH net] tcp_cubic fix to achieve at least the same throughput
 as Reno

On Sat, Aug 10, 2024 at 6:34 PM Mingrui Zhang <mrzhang97@...il.com> wrote:
>
> This patch fixes some CUBIC bugs so that  "CUBIC achieves at least
> the same throughput as Reno in small-BDP networks"
> [RFC 9438: https://www.rfc-editor.org/rfc/rfc9438.html]
>
> It consists of three bug fixes, all changing function bictcp_update()
> of tcp_cubic.c, which controls how fast CUBIC increases its
> congestion window size snd_cwnd.

Thanks for these fixes! I had some patches under development for bugs
(2) and (3), but had not found the time to finish them up and send
them out... And I had not noticed bug (1).  :-) So thanks for sending
out these fixes! :-)

A few comments:

(1) Since these are three separate bug fixes, in accordance with Linux
development standards, which use minimal/narrow/focused patches,
please submit each of the 3 separate fixes as a separate patch, as
part of a patch series (ideally using git format-patch).

(2) I would suggest using the --cover-letter flag to git format-patch
to create a cover letter for the 3-patch series. You could put the
nice experiment data in the cover letter, since the cover letter
applies to all patches, and so do the experiment results.

(3) Please include a "Fixes:" footer in the git commit message
footeres, to indicate which patch this is fixing, to enable backports
of these fixes to stable kernels. You can see examples in the Linux
git history. For example, you might use something like:

Fixes: df3271f3361b ("[TCP] BIC: CUBIC window growth (2.0)")

You can use the following to find the SHA1 of the patch you are fixing:

   git blame -- net/ipv4/tcp_cubic.c

(4) For each patch title, please use "tcp_cubic:" as the first token
in the patch title to indicate the area of the kernel you are fixing,
and have a brief description of the specifics of the fix. For example,
some suggested titles:

tcp_cubic: fix to run bictcp_update() at least once per round

tcp_cubic: fix to match Reno additive increment

tcp_cubic: fix to use emulated Reno cwnd one round in the future

(5) Please run ./scripts/checkpatch.pl to look for issues before sending:

  ./scripts/checkpatch.pl *patch

(6) Please re-test before sending.

> Bug fix 1:
>         ca->ack_cnt += acked;   /* count the number of ACKed packets */
>
>         if (ca->last_cwnd == cwnd &&
> -           (s32)(tcp_jiffies32 - ca->last_time) <= HZ / 32)
> +           (s32)(tcp_jiffies32 - ca->last_time) <= min(HZ / 32, usecs_to_jiffies(ca->delay_min)))
>                 return;
>
>         /* The CUBIC function can update ca->cnt at most once per jiffy.
>
> The original code bypasses bictcp_update() under certain conditions
> to reduce the CPU overhead. Intuitively, when last_cwnd==cwnd,
> bictcp_update() is executed 32 times per second. As a result,
> it is possible that bictcp_update() is not executed for several
> RTTs when RTT is short (specifically < 1/32 second = 31 ms and
> last_cwnd==cwnd which may happen in small-BDP networks),
> thus leading to low throughput in these RTTs.
>
> The patched code executes bictcp_update() 32 times per second
> if RTT > 31 ms or every RTT if RTT < 31 ms, when last_cwnd==cwnd.
>
> Bug fix 2:
>         if (tcp_friendliness) {
>                 u32 scale = beta_scale;
>
> -               delta = (cwnd * scale) >> 3;
> +               if (cwnd < ca->bic_origin_point)
> +                       delta = (cwnd * scale) >> 3;
> +               else
> +                       delta = cwnd;
>                 while (ca->ack_cnt > delta) {           /* update tcp cwnd */
>                         ca->ack_cnt -= delta;
>                         ca->tcp_cwnd++;
>                 }
>
> The original code follows RFC 8312 (obsoleted CUBIC RFC).
>
> The patched code follows RFC 9438 (new CUBIC RFC).
>
> "Once _W_est_ has grown to reach the _cwnd_ at the time of most
> recently setting _ssthresh_ -- that is, _W_est_ >= _cwnd_prior_ --
> the sender SHOULD set α__cubic_ to 1 to ensure that it can achieve
> the same congestion window increment rate as Reno, which uses AIMD
> (1,0.5)."

Since ca->bic_origin_point does not really hold _cwnd_prior_ in the
case of fast_convergence (which is very common), I would suggest using
a new field, perhaps called ca->cwnd_prior, to hold the actual
_cwnd_prior_ value. Something like the following:

-               delta = (cwnd * scale) >> 3;
+               if (cwnd < ca->cwnd_prior)
+                       delta = (cwnd * scale) >> 3;
+               else
+                       delta = cwnd;

Then, in __bpf_kfunc static u32 cubictcp_recalc_ssthresh(struct sock *sk):

         else
                 ca->last_max_cwnd = tcp_snd_cwnd(tp);
        +ca->cwnd_prior = tcp_snd_cwnd(tp);

How does that sound?

Thanks again for these fixes!

Best regards,
neal