netdev - Re: TCP rx window autotuning harmful at LAN context

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 10 Mar 2009 11:49:56 +0100
From:	Marian Ďurkovič <md@....sk>
To:	John Heffner <johnwheffner@...il.com>
Cc:	netdev@...r.kernel.org
Subject: Re: TCP rx window autotuning harmful at LAN context

On Mon, Mar 09, 2009 at 01:23:15PM -0700, John Heffner wrote:
> >> The situation you describe is exactly what congestion control (the
> >> topic of RFC2914) should fix.  It is not the role of receive window
> >> (flow control).  It is really the sender's job to detect and react to
> >> this, not the receiver's.  (We have had this discussion before on
> >> netdev.)
> >
> > It's not of high importance whose job it is according to pure theory.
> > What matters is, that autotuning introduced serious problem at LAN context
> > by disabling any possibility to properly react to increasing RTT. Again,
> > it's not important whether this functionality was there by design or by
> > coincidence, but it was holding the system well-balanced for many years.
> 
> This is not a theoretical exercise, but one in good system design.
> This "well-balanced" system was really broken all along, and
> autotuning has exposed this.

Yes, sure. However we have to live with it until something more clever
is ready for production use. The point here is, that we should try to
make it work as good as possible, but rx window autotuning without
any safety belts is not a step in this direction - as you say, it exposes
the problem, i.e. makes it worse than before.

> A drop-tail queue size of 1000 packets on a local interface is
> questionable, and I think this is the real source of your problem.

It's just one of the problems we're seeing. Others are happening in
the network infrastructure - some routers also have buffers for
250 msec of traffic, ethernet switches sligtly less, but 100 msec
is still a common case. None of this is something that I'm comfortable
with in a LAN context. 

With 64 kB receive buffer, you get full linerate utilization and only
5 msec latency at 100 Mbps in all the above cases without any other
requirements. Thus it's definitely worth considering whether there's
any possibility to keep this behaviour. Seems other autotuning TCP stacks
are already doing this: one of them by keeping autotuning much less aggresive
- window is raised only when more than 7/8 of the current buffer is received
during last RTT and by default it's limited to a maximum of 256 kB (FreeBSD 7),
the other one by dynamically limiting the maximum to small value (64, 128
or 256 kB) when the RTT is low (Vista). 

> > - measure RTT during the initial phase of TCP connection (first X segments)
> > - compute maximal receive window size depending on measured RTT using
> >  configurable constant representing the bandwidth part of BDP
> > - let autotuning do its work upto that limit.
> 
> Let's take this proposal, and try it instead at the sender side, as
> part of congestion control.  Would this proposal make sense in that
> position?  Would you seriously consider it there?

Sender does not have the relevant info to implement this - it might be
connected by 10 GE to the highspeed backbone. Limiting the send window
to 10Gbps*RTT would not help at all as it will certainly be orders of
magnitude higher than most clients need. On the other hand, a workstation
connected at 100 Mpbs certainly knows it couldn't receive more, so there's
negligible advantage in setting the rx window higher than 100Mpbs*RTT.
At LAN context, this will significantly reduce the maximum allowed rx window.  

As for the AQM - this will only help, if *all* network devices between
sender and receiver implement it, which is maybe possible in single
management domain but proven to never work on Internet scale. Thus it's
much better and safer to implement all congestion control mechanisms
at sender or at receiver using all available methods even though they were
not originally meant for this purpose in pure theory. After all, it's not
the network's job to workaround layer-4 protocol problems at every network
device...

       With kind regards,

             M.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html