[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0712041636020.15251@tesla.psc.edu>
Date: Tue, 4 Dec 2007 21:13:34 -0500 (EST)
From: Matt Mathis <mathis@....edu>
To: Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
cc: David Miller <davem@...emloft.net>, Netdev <netdev@...r.kernel.org>
Subject: Re: [PATCH net-2.6 0/3]: Three TCP fixes
I can shed light one one detail: ratehalving w/bounding parameters uses
snd_cwnd/4 to be appropriately conservative during slowstart. Ideally cwnd
would be saved for every transmitted segment, and during recovery, ssthresh
and min_cwnd would be set to saved_cwnd/2. However since cwnd is not saved
and during slowstart it might be doubled each RTT (depending on what other
algorithms are in effect), at the time the loss is detected cwnd/2 might
already be at the maximum window size for the path (pipe+queue size). This
was observed to do some bad things on real networks, so we included another
factor of two reduction, which is really only correct for slowstart.
However, this does not usually hurt the normal loss in congestion avoidence
case, unless something else unexpected goes wrong.
We considered some logic to attempt to estimate cwnd at the time a lost
segment was sent, using the current cwnd and ssthresh (i.e. when the loss is
detected), but it was cumbersome, not well behaved, impossible to model, and
didn't make enough difference.
Actually the reason we abandoned RH is that is it sets cwnd from flight size
in a really bad way. When you have large windows and burst losses, including
possibly a few segments with lost retransmissions, you are likely to depress
the flight size because either you run out of receiver window or the sending
app can't refill the socket quickly enough when snd_una advances. The
consequence is that cwnd is pulled down not by losses, but other non-path
"congestion", which takes potentially thousands of RTTs for AIMD to recover.
Unfortunately idea of setting cwnd from flightsize has been standardized in
RFC 2851....
This is one of the problems that I would really like to revisit, if I had the
time.
Thanks,
--MM--
-------------------------------------------
Matt Mathis http://www.psc.edu/~mathis
Work:412.268.3319 Home/Cell:412.654.7529
-------------------------------------------
On Tue, 4 Dec 2007, John Heffner wrote:
> Ilpo Järvinen wrote:
>> On Tue, 4 Dec 2007, John Heffner wrote:
>>
>>> Ilpo Järvinen wrote:
>>>> ...I'm still to figure out why tcp_cwnd_down uses snd_ssthresh/2
>>>> as lower bound even though the ssthresh was already halved, so
>>>> snd_ssthresh
>>>> should suffice.
>>> I remember this coming up at least once before, so it's probably worth a
>>> comment in the code. Rate-halving attempts to actually reduce cwnd to
>>> half
>>> the delivered window. Here, cwnd/4 (ssthresh/2) is a lower bound on how
>>> far
>>> rate-halving can reduce cwnd. See the "Bounding Parameters" section of
>>> <http://www.psc.edu/networking/papers/FACKnotes/current/>.
>>
>> Thanks for the info! Sadly enough it makes NewReno recovery quite
>> inefficient when there are enough losses and high BDP link (in my case
>> 384k/200ms, BDP sized buffer). There might be yet another bug in it as well
>> (it is still a bit unclear how tcp variables behaved during my scenario and
>> I'll investigate further) but reduction in the transfer rate is going to
>> last longer than a short moment (which is used as motivation in those FACK
>> notes). In fact, if I just use RFC2581 like setting w/o rate-halving (and
>> experience the initial "pause" in sending), the ACK clock to send out new
>> data works very nicely beating rate halving fair and square. For SACK/FACK
>> it works much nicer because recovery is finished much earlier and slow
>> start recovers cwnd quickly.
>
> I believe this is exactly the reason why Matt (CC'd) and Jamshid abandoned
> this line of work in the late 90's. In my opinion, it's probably not such a
> bad idea to use cwnd/2 as the bound. In some situations, the current
> rate-halving code will work better, but as you point out, in others the cwnd
> is lowered too much.
>
>
>> ...Mind if I ask another similar one, any idea why prior_ssthresh is
>> smaller (3/4 of it) than cwnd used to be (see tcp_current_ssthresh)?
>
> Not sure on that one. I'm not aware of any publications this is based on.
> Maybe Alexey knows?
>
> -John
>
Powered by blists - more mailing lists