[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <4A8409CA.70200@nets.rwth-aachen.de>
Date: Thu, 13 Aug 2009 14:40:42 +0200
From: Arnd Hannemann <hannemann@...s.rwth-aachen.de>
To: David Miller <davem@...emloft.net>
Cc: "slot.daniel@...il.com" <slot.daniel@...il.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [PATCH] net/ipv4, linux-2.6.30.4
David Miller schrieb:
> From: Daniel Slot <slot.daniel@...il.com>
> Date: Wed, 12 Aug 2009 20:47:44 +0200
>
>> RFC 4653 specifies Non-Congestion Robustness (NCR) for TCP.
>> In the absence of explicit congestion notification from the network, TCP
>> uses loss as an indication of congestion.
>> One of the ways TCP detects loss is using the arrival of three duplicate
>> acknowledgments.
>> However, this heuristic is not always correct,
>> notably in the case when network paths reorder segments (for whatever
>> reason), resulting in degraded performance.
>
> Linux's TCP stack already has sophisticated reordering detection.รค
Hmm, sophisticated? Sorry, it seemed pretty rudimental/random to me.
Firstly, tp->reordering never shrinks for a given connection
unless an RTO occurs. If that happens tp->reordering is reset to sysctl_tcp_reordering
(but it was initialized with a potentially different value from destination cache).
Why?
Secondly, it simply disables FACK? Disabling FACK completely may (or not) be
the correct solution, if reordering is present. But why don't reenable FACK
after no more reordering is detected? It won't even get re-enabled if an RTO occurs.
It seems even more strange that tp->reordering is used in FACK paths, too.
So if one sets a high sysctl_tcp_reordering, because one expects reordering,
tcp_update_reordering will probably NOT disable FACK, but instead FACK will
be used with a high tp->reordering value.
Thirdly, in most cases it will only
trigger if spurious retransmits already happened. If it triggers in advance (due to
SACK logic, the updated reordering metric will be IMO one to small, leading again
to a spurios retransmit, if a reordering event with the same length will happen again)
IOW it will mostly only reduce the damage to congestion control, but will send out
spurious packets nevertheless.
In my point of view, on should at least build some EWMA or histogram, or build some
whatever statistcs to measure detected reordering and based on this measurement,
adjust the dupthresh (or max_burst, or whatever). Off course, there is always
the question of how much better such an sophisticated statistic will work,
than the current very pragmatic solution...
Please correct me if I'm wrong or just too stupid to understand this stuff.
(very likely;-)
>
>> TCP-NCR is designed to mitigate this degraded performance by increasing the
>> number of duplicate acknowledgments required to trigger loss recovery,
>> based on the current state of the connection, in an effort to better
>> disambiguate true segment loss from segment reordering.
>
> We already have code in the stack which tries to detect packet
> reordering with a high level of sophistication.
On the contrary RFC 4653 does not even try to detect reordering. It simply
delays the congestion response in a way which seems very straightforward.
Of course there is the negative impact of increased latency. (Loss recovery
takes longer). However, for large ftp/http transfers, who cares about latency?
There must be some logic in the kernel to detect applications which are
doing bulk transfers for the buffer autotuning, what about enabling RFC 4653
in case such an application is detected?
Daniel, I would assume RFC 4653 would simply work with FACK, at least if
there is no reordering present?
Best regards,
Arnd
--
Dipl.-Inform. Arnd Hannemann
RWTH Aachen University
Dept. of Computer Science, Informatik 4
Ahornstr. 55, D-52074 Aachen, Germany
Phone: (+49 241) 80-21423 Fax: (+49 241) 80-22220
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists