netdev - Re: [PATCH net] tcp: avoid multiple ssthresh reductions in on retransmit window

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK6E8=d8G_VCmb-PGssuS08xAFOSgUF=TaKzbDMtgRFd9v3PFw@mail.gmail.com>
Date:	Wed, 18 Jun 2014 09:56:19 -0700
From:	Yuchung Cheng <ycheng@...gle.com>
To:	Jay Vosburgh <jay.vosburgh@...onical.com>
Cc:	Michal Kubecek <mkubecek@...e.cz>,
	Neal Cardwell <ncardwell@...gle.com>,
	"David S. Miller" <davem@...emloft.net>,
	netdev <netdev@...r.kernel.org>,
	Alexey Kuznetsov <kuznet@....inr.ac.ru>,
	James Morris <jmorris@...ei.org>,
	Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
	Patrick McHardy <kaber@...sh.net>
Subject: Re: [PATCH net] tcp: avoid multiple ssthresh reductions in on
 retransmit window

On Tue, Jun 17, 2014 at 5:38 PM, Jay Vosburgh
<jay.vosburgh@...onical.com> wrote:
> Michal Kubecek <mkubecek@...e.cz> wrote:
>
>>On Tue, Jun 17, 2014 at 02:35:23PM -0700, Yuchung Cheng wrote:
>>> On Tue, Jun 17, 2014 at 5:20 AM, Michal Kubecek <mkubecek@...e.cz> wrote:
>>> > On Mon, Jun 16, 2014 at 08:44:04PM -0400, Neal Cardwell wrote:
>>> >> On Mon, Jun 16, 2014 at 8:25 PM, Yuchung Cheng <ycheng@...gle.com> wrote:
>>> >> > However Linux is inconsistent on the loss of a retransmission. It
>>> >> > reduces ssthresh (and cwnd) if this happens on a timeout, but not in
>>> >> > fast recovery (tcp_mark_lost_retrans). We should fix that and that
>>> >> > should help dealing with traffic policers.
>>> >>
>>> >> Yes, great point!
>>> >
>>> > Does it mean the patch itself would be acceptable if the reasoning in
>>> > its commit message was changed? Or would you prefer a different way to
>>> > unify the two situations?
>>>
>>> It's the latter but it seems to belong to a different patch (and it'll
>>> not solve the problem you are seeing).
>>
>>OK, thank you. I guess we will have to persuade them to move to cubic
>>which handles their problems much better.
>>
>>> The idea behind the RFC is that TCP should reduce cwnd and ssthresh
>>> across round trips of send, but not within an RTT. Suppose cwnd was
>>> 10 on first timeout, so cwnd becomes 1 and ssthresh is 5. Then after 3
>>> round trips, we time out again. By the design of Reno this should
>>> reset cwnd from 8 to 1, and ssthresh from 5 to 2.5.
>>
>>Shouldn't that be from 5 to 4? We reduce ssthresh to half of current
>>cwnd, not current ssthresh.
Oops yes it should be 8 to 4.
>>
>>BtW, this is exactly the problem our customer is facing: they have
>>relatively fast line (15 Mb/s) but with big buffers so that the
>>roundtrip times can rise from unloaded 35 ms up to something like 1.5 s
>>under full load.
>>
>>What happens is this: cwnd initally rises to ~2100 then first drops
>>are encountered, cwnd is set to 1 and ssthresh to ~1050. The slow start
>>lets cwnd reach ssthresh but after that, a slow linear growth follows.
>>In this state, all in-flight packets are dropped (simulation of what
>>happens on router switchover) so that cwnd is reset to 1 again and
>>ssthresh to something like 530-550 (cwnd was a bit higher than ssthresh).
>>If a packet loss comes shortly after that, cwnd is still very low and
>>ssthresh is reduced to half of that cwnd (i.e. much lower than to half
>>of ssthresh). If unlucky, one can even end up with ssthresh reduced to 2
>>which takes really long to recover from.
>
>         I'm also looking into a problem that exhibits very similar TCP
> characteristics, even down to cwnd and ssthresh values similar to what
> you cite.  In this case, the situation has to do with high RTT (around
> 80 ms) connections competing with low RTT (1 ms) connections.  This case
> is already using cubic.
>
>         Essentially, a high RTT connection to the server transfers data
> in at a reasonable and steady rate until something causes some packets
> to be lost (in this case, another transfer from a low RTT host to the
> same server).  Some packets are lost, and cwnd drops from ~2200 to ~300
> (in stages, first to ~1500, then ~600, then to ~300, ).  The ssthresh
> starts at around 1100, then drops to ~260, which is the lowest cwnd
> value.
>
>         The recovery from the low cwnd situation is very slow; cwnd
> climbs a bit and then remains essentially flat for around 5 seconds.  It
> then begins to climb until a few packets are lost again, and the cycle
> repeats.  If no futher losses occur (if the competing traffic has
> ceased, for example), recovery from a low cwnd (300 - 750 ish) to the
> full value (~2200) requires on the order of 20 seconds.  The connection
> exits recovery state fairly quickly, and most of the 20 seconds is spent
> in open state.

ssthresh is problematic. Both cases show the same shortcoming of
Reno/Cubic using losses and ssthresh.
If losses are not caused by queue overflows but by link flaps, bursts,
etc, the ssthresh is not indicative of BDP. It's kinda a random
value (>> BDP on BB, <<BDP in these cases). TCP throughput goes
south if we hit two losses within a few RTT and it's a point of no
return :( Hopefully someone can come up a more intelligent control.

Several posts in tcpm also discuss the low ssthresh issues.

http://www.ietf.org/mail-archive/web/tcpm/current/msg08145.html
http://www.ietf.org/mail-archive/web/tcpm/current/msg08778.html

>
>         -J
>
> ---
>         -Jay Vosburgh, jay.vosburgh@...onical.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html