netdev - Re: TCP rx window autotuning harmful at LAN context

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090309133324.0dd56f82@nehalam>
Date:	Mon, 9 Mar 2009 13:33:24 -0700
From:	Stephen Hemminger <shemminger@...tta.com>
To:	John Heffner <johnwheffner@...il.com>
Cc:	Marian Ďurkovič <md@....sk>,
	netdev@...r.kernel.org
Subject: Re: TCP rx window autotuning harmful at LAN context

On Mon, 9 Mar 2009 13:23:15 -0700
John Heffner <johnwheffner@...il.com> wrote:

> On Mon, Mar 9, 2009 at 1:02 PM, Marian Ďurkovič <md@....sk> wrote:
> > On Mon, 9 Mar 2009 11:01:52 -0700, John Heffner wrote
> >> On Mon, Mar 9, 2009 at 4:25 AM, Marian Ďurkovič <md@....sk> wrote:
> >> >   As rx window autotuning is enabled in all recent kernels and with 1 GB
> >> > of RAM the maximum tcp_rmem becomes 4 MB, this problem is spreading rapidly
> >> > and we believe it needs urgent attention. As demontrated above, such huge
> >> > rx window (which is at least 100*BDP of the example above) does not deliver
> >> > any performance gain but instead it seriously harms other hosts and/or
> >> > applications. It should also be noted, that host with autotuning enabled
> >> > steals an unfair share of the total available bandwidth, which might look
> >> > like a "better" performing TCP stack at first sight - however such behaviour
> >> > is not appropriate (RFC2914, section 3.2).
> >>
> >> It's well known that "standard" TCP fills all available drop-tail
> >> buffers, and that this behavior is not desirable.
> >
> > Well, in practice that was always limited by receive window size, which
> > was by default 64 kB on most operating systems. So this undesirable behavior
> > was limited to hosts where receive window was manually increased to huge values.
> >
> > Today, the real effect of autotuning is the same as changing the receive window
> > size to 4 MB on *all* hosts, since there's no mechanism to prevent it from
> > growing the window to maximum even for low RTT paths.
> >
> >> The situation you describe is exactly what congestion control (the
> >> topic of RFC2914) should fix.  It is not the role of receive window
> >> (flow control).  It is really the sender's job to detect and react to
> >> this, not the receiver's.  (We have had this discussion before on
> >> netdev.)
> >
> > It's not of high importance whose job it is according to pure theory.
> > What matters is, that autotuning introduced serious problem at LAN context
> > by disabling any possibility to properly react to increasing RTT. Again,
> > it's not important whether this functionality was there by design or by
> > coincidence, but it was holding the system well-balanced for many years.
> 
> This is not a theoretical exercise, but one in good system design.
> This "well-balanced" system was really broken all along, and
> autotuning has exposed this.
> 
> A drop-tail queue size of 1000 packets on a local interface is
> questionable, and I think this is the real source of your problem.
> This change was introduced a few years ago on most drivers --
> generally used to be 100 by default.  This was partly because TCP
> slow-start has problems when a drop-tail queue is smaller than the
> BDP.  (Limited slow-start is meant to address this problem, but
> requires tuning to the right value.)  Again, using AQM is likely the
> best solution.

By default, sky2 queue is 511 pkts which is 6.2ms on @ 1G.
Probably, should be half that by default. Also there is
software transmit queue as well, which could be 0 unless some
form of AQM is being done.

> 
> > Now, as autotuning is enabled by default in stock kernel, this problem is
> > spreading into LANs without users even knowing what's going on. Therefore
> > I'd like to suggest to look for a decent fix which could be implemented
> > in relatively short time frame. My proposal is this:
> >
> > - measure RTT during the initial phase of TCP connection (first X segments)
> > - compute maximal receive window size depending on measured RTT using
> >  configurable constant representing the bandwidth part of BDP
> > - let autotuning do its work upto that limit.
> 
> Let's take this proposal, and try it instead at the sender side, as
> part of congestion control.  Would this proposal make sense in that
> position?  Would you seriously consider it there?
> 
> (As a side note, this is in fact what happens if you disable
> timestamps, since TCP cannot get an updated measurement of RTT without
> timestamps, only a lower bound.  However, I consider this a limitation
> not a feature.)
> 
>   -John
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html