lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110308152103.714f5f05@nehalam>
Date:	Tue, 8 Mar 2011 15:21:03 -0800
From:	Stephen Hemminger <shemminger@...tta.com>
To:	David Miller <davem@...emloft.net>
Cc:	rhee@...u.edu, lucas.nussbaum@...ia.fr, xiyou.wangcong@...il.com,
	netdev@...r.kernel.org
Subject: Re: [PATCH] Make CUBIC Hystart more robust to RTT variations

On Tue, 08 Mar 2011 11:43:46 -0800 (PST)
David Miller <davem@...emloft.net> wrote:

> From: Injong Rhee <rhee@...u.edu>
> Date: Tue, 08 Mar 2011 10:26:36 -0500
> 
> > Thanks for updating CUBIC hystart. You might want to test the
> > cases with more background traffic and verify whether this
> > threshold is too conservative.
> 
> So let's get down to basics.
> 
> What does Hystart do specially that allows it to avoid all of the
> problems that TCP VEGAS runs into.
> 
> Specifically, that if you use RTTs to make congestion control
> decisions it is impossible to notice new bandwidth becomming available
> fast enough.
> 
> Again, it's impossible to react fast enough.  No matter what you tweak
> all of your various settings to, this problem will still exist.
> 
> This is a core issue, you cannot get around it.
> 
> This is why I feel that Hystart is fundamentally flawed and we should
> turn it off by default if not flat-out remove it.
> 
> Distributions are turning it off by default already, therefore it's
> stupid for the upstream kernel to behave differently if that's what
> %99 of the world is going to end up experiencing.

The assumption in Hystart that spacing between ACK's is solely due to
congestion is a bad. If you read the paper, this is why FreeBSD's
estimation logic is dismissed. The Hystart problem is different
than the Vegas issue.

Algorithms that look at min RTT are ok, since the lower bound is
fixed; additional queuing and variation in network only increases RTT
it never reduces it. With a min RTT it is possible to compute the
upper bound on available bandwidth. i.e If all packets were as good as
this estimate minRTT then the available bandwidth is X. But then using
an individual RTT sample to estimate unused bandwidth is flawed. To
quote paper.

  "Thus, by checking whether ∆(N ) is larger than Dmin , we
can detect whether cwnd has reached the available capacity
of the path" 

So what goes wrong:
  1. Dmin can be too large because this connection always sees delays
due to other traffic or hardware. i.e buffer bloat.  This would cause
the bandwidth estimate to be too low and therefore TCP would leave
slow start too early (and not get up to full bandwidth).

  2. Dmin can be smaller than the clock resolution. This would cause
either sample to be ignored, or Dmin to be zero. If Dmin is zero,
the bandwidth estimate would in theory be infinite, which would
lead to TCP not leaving slow start because of Hystart. Instead
TCP would leave slow start at first loss.

Other possible problems:
  3. ACK's could be nudged together by variations in delay.
This would cause HyStart to exit slow start prematurely. To false
think it is an ACK train.

Noise in network is not catastrophic, it just
causes TCP to exit slow-start early and have to go into normal
window growth phase. The problem is that the original non-Hystart
behavior of Cubic is unfair; the first flow dominates the link
and other flows are unable to get in. If you run tests with two
flows one will get a larger share of the bandwidth.

I think Hystart is okay in concept but there may be issues
on low RTT links as well as other corner cases that need bug
fixing.

1. Needs to use better resolution than HZ. Since HZ can be 100.
2. Hardcoding 2ms as spacing between ACK's as train is wrong
   for local networks.





--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ