[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AE90C24D6B3A694183C094C60CF0A2F6026B70F7@saturn3.aculab.com>
Date: Wed, 19 Dec 2012 09:52:54 -0000
From: "David Laight" <David.Laight@...LAB.COM>
To: "Rick Jones" <rick.jones2@...com>
Cc: "Cong Wang" <amwang@...hat.com>, <netdev@...r.kernel.org>,
"Ben Greear" <greearb@...delatech.com>,
"David Miller" <davem@...emloft.net>,
"Eric Dumazet" <eric.dumazet@...il.com>,
"Stephen Hemminger" <shemminger@...tta.com>,
"Thomas Graf" <tgraf@...hat.com>
Subject: RE: TCP delayed ACK heuristic
> > I've seen problems when the sending side is doing (I think)
> > 'slow start' with Nagle disabled.
> > The sender would only send 4 segments before waiting for an
> > ACK - even when it had more than a full sized segment waiting.
> > Sender was Linux 2.6.something (probably low 20s).
> > I changed the application flow to send data in the reverse
> > direction to avoid the problem.
> > That was on a ~0 delay local connection - which means that
> > there is almost never outstanding data, and the 'slow start'
> > happened almost all the time.
> > Nagle is completely the wrong algorithm for the data flow.
>
> If Nagle was already disabled, why the last sentence? And from your
> description, even if Nagle were enabled, I would think that it was
> remote ACK+cwnd behaviour getting in your way, not Nagle, given that
> Nagle is to be decided on a user-send by user-send basis and release
> queued data (to the mercies of other heuristics) when it gets to be an
> MSS-worth.
With Nagle enabled the first segment is sent, the following ones
get buffered until full segments can be sent. Although (probably)
only 4 segments will be sent (1 small and 3 full) the 3rd of these
does generate an ack.
> ... but it sounds like you have an actual
> application looking to do that??
We are relaying data packets received over multiple SS7 signalling
links (64k hdlc) over a TCP connection. The connection will be local,
in some cases the host ethernet MAC, switch, and target cpu are all
on the same PCI(e) card (MII crossover links).
While a delay of a millisecond or two wouldn't matter (1ms is 8 byte
times) the Nagle delay is far too long - and since the data isn't
command/response the Nagle would delay happen a lot.
One of the conformance tests managed to make the system 'busy'.
Since all it does is make one 64k channel busy it shouldn't have
been able to generate a backlog of receive data - but it managed to
get over 100 data packets unacked (app level ack).
> Allowing a byte-limit-cwnd's worth of single-byte-payload TCP segments
> could easily be seen as being rather anti-social :)
If the actual RTT is almost zero (as in our case) and the network
really shouldn't be dropping packets the it doesn't matter.
I suspect that if the tx rate is faster than the RTT then the
'slow start' turns off and you can get a lot of small segments
in flight. But when the RTT is zero 'slow start' almost always
applies and you only send 4.
> And forcing/maintaining the original segment boundaries in
> retransmissions for small packets isn't such a hot idea either.
True, not splitting them might be useful, but to need to avoid
merges.
David
Powered by blists - more mailing lists