netdev - Re: [PATCH] Make CUBIC Hystart more robust to RTT variations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTi=k+9ObSFjS2txWWqrAAR6WD+TJpnnxfi+W-xJj@mail.gmail.com>
Date:	Tue, 8 Mar 2011 20:33:39 -0500
From:	Sangtae Ha <sangtae.ha@...il.com>
To:	Stephen Hemminger <shemminger@...tta.com>
Cc:	David Miller <davem@...emloft.net>, rhee@...u.edu,
	lucas.nussbaum@...ia.fr, xiyou.wangcong@...il.com,
	netdev@...r.kernel.org
Subject: Re: [PATCH] Make CUBIC Hystart more robust to RTT variations

Hi Stephen,

Thank you for your feedback. Please see my answers below.

On Tue, Mar 8, 2011 at 6:21 PM, Stephen Hemminger <shemminger@...tta.com> wrote:
> On Tue, 08 Mar 2011 11:43:46 -0800 (PST)
> David Miller <davem@...emloft.net> wrote:
>
>> From: Injong Rhee <rhee@...u.edu>
>> Date: Tue, 08 Mar 2011 10:26:36 -0500
>>
>> > Thanks for updating CUBIC hystart. You might want to test the
>> > cases with more background traffic and verify whether this
>> > threshold is too conservative.
>>
>> So let's get down to basics.
>>
>> What does Hystart do specially that allows it to avoid all of the
>> problems that TCP VEGAS runs into.
>>
>> Specifically, that if you use RTTs to make congestion control
>> decisions it is impossible to notice new bandwidth becomming available
>> fast enough.
>>
>> Again, it's impossible to react fast enough.  No matter what you tweak
>> all of your various settings to, this problem will still exist.
>>
>> This is a core issue, you cannot get around it.
>>
>> This is why I feel that Hystart is fundamentally flawed and we should
>> turn it off by default if not flat-out remove it.
>>
>> Distributions are turning it off by default already, therefore it's
>> stupid for the upstream kernel to behave differently if that's what
>> %99 of the world is going to end up experiencing.
>
> The assumption in Hystart that spacing between ACK's is solely due to
> congestion is a bad. If you read the paper, this is why FreeBSD's
> estimation logic is dismissed. The Hystart problem is different
> than the Vegas issue.
>
> Algorithms that look at min RTT are ok, since the lower bound is
> fixed; additional queuing and variation in network only increases RTT
> it never reduces it. With a min RTT it is possible to compute the
> upper bound on available bandwidth. i.e If all packets were as good as
> this estimate minRTT then the available bandwidth is X. But then using
> an individual RTT sample to estimate unused bandwidth is flawed. To
> quote paper.
>
>  "Thus, by checking whether ∆(N ) is larger than Dmin , we
> can detect whether cwnd has reached the available capacity
> of the path"
>
> So what goes wrong:
>  1. Dmin can be too large because this connection always sees delays
> due to other traffic or hardware. i.e buffer bloat.  This would cause
> the bandwidth estimate to be too low and therefore TCP would leave
> slow start too early (and not get up to full bandwidth).

This is true. But the idea is that running the congestion avoidance
algorithm of CUBIC for this case is better than hurting other flows
with abrupt perturbation, since the growth of CUBIC is quite
responsive and grab the bandwidth quickly in normal network
conditions.

>
>  2. Dmin can be smaller than the clock resolution. This would cause
> either sample to be ignored, or Dmin to be zero. If Dmin is zero,
> the bandwidth estimate would in theory be infinite, which would
> lead to TCP not leaving slow start because of Hystart. Instead
> TCP would leave slow start at first loss.

True. But since HyStart didn't clamp the threshold,  ca->delay_min>>4,
it can prematurely leave slow start for very small Dmin. I think this
needs to be fixed, along with the hard-coded 2ms you mentioned below.


>
> Other possible problems:
>  3. ACK's could be nudged together by variations in delay.
> This would cause HyStart to exit slow start prematurely. To false
> think it is an ACK train.

This doesn't happen when the delay is not too small (in typical WAN
including DSL), but it is possible with very small delays since the
code checking the valid ACK train uses the 2ms fixed value, which is
large for LAN.

>
> Noise in network is not catastrophic, it just
> causes TCP to exit slow-start early and have to go into normal
> window growth phase. The problem is that the original non-Hystart
> behavior of Cubic is unfair; the first flow dominates the link
> and other flows are unable to get in. If you run tests with two
> flows one will get a larger share of the bandwidth.
>
> I think Hystart is okay in concept but there may be issues
> on low RTT links as well as other corner cases that need bug
> fixing.

We do not use the delay as indication of congestion, but we use it for
improving the stability and overall performance.
Preventing burst losses quite help for mid to large BDP paths and the
performance results with non-TCP-SACK receivers are also encouraging.
I will work on the fix for the issues below.

>
> 1. Needs to use better resolution than HZ. Since HZ can be 100.
> 2. Hardcoding 2ms as spacing between ACK's as train is wrong
>   for local networks.
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html