lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iL1g3VQHDfru2yZrHD8EDgKCKGL7-AjYNw+oCdeBQLfow@mail.gmail.com>
Date: Mon, 26 Aug 2024 11:25:59 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: Mingrui Zhang <mrzhang97@...il.com>
Cc: davem@...emloft.net, ncardwell@...gle.com, netdev@...r.kernel.org, 
	Lisong Xu <xu@....edu>
Subject: Re: [PATCH net v4 1/3] tcp_cubic: fix to run bictcp_update() at least
 once per RTT

On Sun, Aug 25, 2024 at 7:47 PM Mingrui Zhang <mrzhang97@...il.com> wrote:
>
> On 8/20/24 07:53, Eric Dumazet wrote:
> > On Mon, Aug 19, 2024 at 10:36 PM Mingrui Zhang <mrzhang97@...il.com> wrote:
> >> On 8/19/24 04:00, Eric Dumazet wrote:
> >>> On Sat, Aug 17, 2024 at 6:35 PM Mingrui Zhang <mrzhang97@...il.com> wrote:
> >>>> The original code bypasses bictcp_update() under certain conditions
> >>>> to reduce the CPU overhead. Intuitively, when last_cwnd==cwnd,
> >>>> bictcp_update() is executed 32 times per second. As a result,
> >>>> it is possible that bictcp_update() is not executed for several
> >>>> RTTs when RTT is short (specifically < 1/32 second = 31 ms and
> >>>> last_cwnd==cwnd which may happen in small-BDP networks),
> >>>> thus leading to low throughput in these RTTs.
> >>>>
> >>>> The patched code executes bictcp_update() 32 times per second
> >>>> if RTT > 31 ms or every RTT if RTT < 31 ms, when last_cwnd==cwnd.
> >>>>
> >>>> Fixes: df3271f3361b ("[TCP] BIC: CUBIC window growth (2.0)")
> >>>> Fixes: ac35f562203a ("tcp: bic, cubic: use tcp_jiffies32 instead of tcp_time_stamp")
> >>> I do not understand this Fixes: tag ?
> >>>
> >>> Commit  ac35f562203a was essentially a nop at that time...
> >>>
> >> I may misunderstood the use of Fixes tag and choose the latest commit of that line.
> >>
> >> Shall it supposed to be the very first commit with that behavior?
> >> That is, the very first commit (df3271f3361b ("[TCP] BIC: CUBIC window growth (2.0)")) when the code was first introduced?
> > I was referring to this line : Fixes: ac35f562203a ("tcp: bic, cubic:
> > use tcp_jiffies32 instead of tcp_time_stamp")
> >
> > Commit ac35f562203a did not change the behavior at all.
> >
> > I see no particular reason to mention it, this is confusing.
> >
> >
> >>>> Signed-off-by: Mingrui Zhang <mrzhang97@...il.com>
> >>>> Signed-off-by: Lisong Xu <xu@....edu>
> >>>> ---
> >>>> v3->v4: Replace min() with min_t()
> >>>> v2->v3: Correct the "Fixes:" footer content
> >>>> v1->v2: Separate patches
> >>>>
> >>>>  net/ipv4/tcp_cubic.c | 6 +++++-
> >>>>  1 file changed, 5 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
> >>>> index 5dbed91c6178..00da7d592032 100644
> >>>> --- a/net/ipv4/tcp_cubic.c
> >>>> +++ b/net/ipv4/tcp_cubic.c
> >>>> @@ -218,8 +218,12 @@ static inline void bictcp_update(struct bictcp *ca, u32 cwnd, u32 acked)
> >>>>
> >>>>         ca->ack_cnt += acked;   /* count the number of ACKed packets */
> >>>>
> >>>> +       /* Update 32 times per second if RTT > 1/32 second,
> >>>> +        * or every RTT if RTT < 1/32 second even when last_cwnd == cwnd
> >>>> +        */
> >>>>         if (ca->last_cwnd == cwnd &&
> >>>> -           (s32)(tcp_jiffies32 - ca->last_time) <= HZ / 32)
> >>>> +           (s32)(tcp_jiffies32 - ca->last_time) <=
> >>>> +           min_t(s32, HZ / 32, usecs_to_jiffies(ca->delay_min)))
> >>> This looks convoluted to me and still limited if HZ=250 (some distros
> >>> still use 250 jiffies per second :/ )
> >>>
> >>> I would suggest switching to usec right away.
> >> Thank you for the suggestion, however, I may need more time to discuss with another author for this revision. :)
> >> Thank you
> > No problem, there is no hurry.
>
> Thank you, Eric, for your suggestion (switching ca->last_time from jiffies to usec)!
> We thought about it and feel that it is more complicated and beyond the scope of this patch.
>
> There are two blocks of code in bictcp_update().
> * Block 1: cubic calculation, which is computationally intensive.
> * Block 2: tcp friendliness, which emulates RENO.
>
> There are two if statements to control how often these two blocks are called to reduce CPU overhead.
>  * If statement 1:  if the condition is true, none of the two blocks are called.
> if (ca->last_cwnd == cwnd &&
>                     (s32)(tcp_jiffies32 - ca->last_time) <= HZ / 32)
>                                 return;
>
> * If statement 2: If the condition is true, block 1 is not called. Intuitively, block 1 is called at most once per jiffy.
> if (ca->epoch_start && tcp_jiffies32 == ca->last_time)
>                                 goto tcp_friendliness;
>
>
> This patch changes only the first if statement. If we switch ca->last_time from jiffies to usec,
> we need to change not only the first if statement but also the second if statement, as well as block 1.
> * change the first if statement from jiffies to usec.
> * change the second if statement from jiffies to usec. Need to determine how often (in usec) block 1 is called
> * change block 1 from jiffies to usec. Should be fine, but need to make sure no calculation overflow.

No problem, I can take care of the jiffies -> usec conversion, you can
then send your patch on top of it.

>
> Therefore, it might be better to keep the current patch as it is, and address the switch from jiffies to usec in future patches.

I prefer you rebase your patch after mine is merged.

There is a common misconception with jiffies.
It can change in less than 20 nsec.
Assuming that delta(jiffies) == 1 means that 1ms has elapsed is plain wrong.
In the old days, linux TCP only could rely on jiffies and we had to
accept its limits.
We now can switch to high resolution clocks, without extra costs,
because we already cache in tcp->tcp_mstamp
the usec timestamp for the current time.

Some distros are using CONFIG_HZ_250=y or CONFIG_HZ_100=y, this means
current logic in cubic is more fuzzy for them.

Without ca->last_time conversion to jiffies, your patch would still be
limited to jiffies resolution:
usecs_to_jiffies(ca->delay_min) would round up to much bigger values
for DC communications.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ