[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <50B4FFDA.1050808@hp.com>
Date: Tue, 27 Nov 2012 10:00:58 -0800
From: Rick Jones <rick.jones2@...com>
To: Saku Ytti <saku@...i.fi>
CC: netdev@...r.kernel.org
Subject: Re: TCP and reordering
On 11/27/2012 09:15 AM, Saku Ytti wrote:
> On 27 November 2012 19:05, Rick Jones <rick.jones2@...com> wrote:
>
>> Packet reordering is supposed to be the exception, not the rule.
>> Links which habitually/constantly introduce reordering are, in my
>> opinion, broken. Optimizing for them would be optimizing an error
>> case.
>
> TCP used to be friendly to reordering before fast retransmit
> optimization was implemented.
It remained "friendly" to reordering even after fast retransmit was
implemented - just not to particularly bad reordering.
And friendly is somewhat relative. Even before fast retransmit came to
be, TCP would immediately ACK each out-of-order segment.
> It seems like minimal complexity in TCP algorithm and would
> dynamically work correctly depending on situation. It is rather slim
> comfort that network should work, when it does not, and you cannot
> affect it.
It is probably considered an "ancient" text these days, but one of the
chapter intros for The Mythical Man Month includes a quote from Ovid:
adde parvum parvo magnus acervus erit
which if recollection serves the book translated as:
add little to little and soon there will be a big pile.
> But if the complexity is higher than I expect, then I fully agree,
> makes no sense to add it. Reason why reordering can happen in modern
> MPLS network is that you have to essentially duck type your traffic,
> and sometimes you duck type them wrong and you are then calculating
> ECMP on incorrect values, causing packets inside flow to take
> different ports.
I appreciate that one may not always have "access" and there can be
layer 8 and layer 9 issues involved, but if incorrect typing is the root
cause of the reordering, treating root cause rather than the symptom is
what should happen. How many kludges, no matter how angelic, can fit in
a TCP implementation?
For other reasons (CPU utilization) various stacks (HP-UX, Solaris, some
versions of MacOS) have had explicit ACK-avoidance heuristics. They
would back-off from ack-every-other to back-every-N, N >> 2. The
heuristics worked quite well in LAN environments and on bulk flows (eg
FTP, ttcp, netperf TCP_STREAM), not necessarily as well in other
environments. One (very necessary) part of the heuristic in those is
to go back to ack-every-other when necessary. That keyed off the
standalone ACK timer - if that ever fired the current avoidance count
would go back to 2, and the max allowed would be half what it was
before. However, that took a non-trivial performance hit when there was
"tail-drop" and something that wasn't a continuous stream of traffic -
the tail got dropped, nothing to cause out-of-order to force immediate
ACKs. (*) Standalone ACK timer is then the only thing getting us back
out which means idleness. I worked a number of "WAN performance
problems" involving one of those stacks where part of the solution was
turning down the limits on the ack avoidance heuristic by a considerable
quantity. (And I say this as someone with a fondness for the schemes)
I cannot say with certainty your idea would have the same problems but
as you look to work-out a solution to propose as a patch, you will have
to keep that in mind.
rick
* yes, the same holds true for a non-ack-avoiding setup, the heuristic
simply made it worse - especially if the sender wanted to send but had
gotten limited by cwnd - the ACK(s) of the head of that chunk of data
were "avoided" and so wouldn't open the cwnd which might then allow
futher segments to enable detection of the dropped segments. Even
without losses it also tended to interact poorly with sending TCPs which
wantend to increase the congestion window by one MSS for each ACK rather
than based on the quantity of bytes covered by the ACK.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists