netdev - Re: [PATCH net-2.6 0/3]: Three TCP fixes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0712041636020.15251@tesla.psc.edu>
Date:	Tue, 4 Dec 2007 21:13:34 -0500 (EST)
From:	Matt Mathis <mathis@....edu>
To:	Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
cc:	David Miller <davem@...emloft.net>, Netdev <netdev@...r.kernel.org>
Subject: Re: [PATCH net-2.6 0/3]: Three TCP fixes

I can shed light one one detail:  ratehalving w/bounding parameters uses 
snd_cwnd/4 to be appropriately conservative during slowstart.  Ideally cwnd 
would be saved for every transmitted segment, and during recovery, ssthresh 
and min_cwnd would be set to saved_cwnd/2.  However since cwnd is not saved 
and during slowstart it might be doubled each RTT (depending on what other 
algorithms are in effect), at the time the loss is detected cwnd/2 might 
already be at the maximum window size for the path (pipe+queue size).  This 
was observed to do some bad things on real networks, so we included another 
factor of two reduction, which is really only correct for slowstart. 
However, this does not usually hurt the normal loss in congestion avoidence 
case, unless something else unexpected goes wrong.

We considered some logic to attempt to estimate cwnd at the time a lost 
segment was sent, using the current cwnd and ssthresh (i.e. when the loss is 
detected), but it was cumbersome, not well behaved, impossible to model, and 
didn't make enough difference.

Actually the reason we abandoned RH is that is it sets cwnd from flight size 
in a really bad way.  When you have large windows and burst losses, including 
possibly a few segments with lost retransmissions, you are likely to depress 
the flight size because either you run out of receiver window or the sending 
app can't refill the socket quickly enough when snd_una advances.  The
consequence is that cwnd is pulled down not by losses, but other non-path 
"congestion", which takes potentially thousands of RTTs for AIMD to recover.

Unfortunately idea of setting cwnd from flightsize has been standardized in 
RFC 2851....

This is one of the problems that I would really like to revisit, if I had the
time.

Thanks,
--MM--
-------------------------------------------
Matt Mathis      http://www.psc.edu/~mathis
Work:412.268.3319    Home/Cell:412.654.7529
-------------------------------------------

On Tue, 4 Dec 2007, John Heffner wrote:

> Ilpo Järvinen wrote:
>> On Tue, 4 Dec 2007, John Heffner wrote:
>> 
>>> Ilpo Järvinen wrote:
>>>> ...I'm still to figure out why tcp_cwnd_down uses snd_ssthresh/2
>>>> as lower bound even though the ssthresh was already halved, so 
>>>> snd_ssthresh
>>>> should suffice.
>>> I remember this coming up at least once before, so it's probably worth a
>>> comment in the code.  Rate-halving attempts to actually reduce cwnd to 
>>> half
>>> the delivered window.  Here, cwnd/4 (ssthresh/2) is a lower bound on how 
>>> far
>>> rate-halving can reduce cwnd.  See the "Bounding Parameters" section of
>>> <http://www.psc.edu/networking/papers/FACKnotes/current/>.
>> 
>> Thanks for the info! Sadly enough it makes NewReno recovery quite 
>> inefficient when there are enough losses and high BDP link (in my case 
>> 384k/200ms, BDP sized buffer). There might be yet another bug in it as well 
>> (it is still a bit unclear how tcp variables behaved during my scenario and 
>> I'll investigate further) but reduction in the transfer rate is going to 
>> last longer than a short moment (which is used as motivation in those FACK 
>> notes). In fact, if I just use RFC2581 like setting w/o rate-halving (and 
>> experience the initial "pause" in sending), the ACK clock to send out new 
>> data works very nicely beating rate halving fair and square. For SACK/FACK 
>> it works much nicer because recovery is finished much earlier and slow 
>> start recovers cwnd quickly.
>
> I believe this is exactly the reason why Matt (CC'd) and Jamshid abandoned 
> this line of work in the late 90's.  In my opinion, it's probably not such a 
> bad idea to use cwnd/2 as the bound.  In some situations, the current 
> rate-halving code will work better, but as you point out, in others the cwnd 
> is lowered too much.
>
>
>> ...Mind if I ask another similar one, any idea why prior_ssthresh is 
>> smaller (3/4 of it) than cwnd used to be (see tcp_current_ssthresh)?
>
> Not sure on that one.  I'm not aware of any publications this is based on. 
> Maybe Alexey knows?
>
>  -John
>