netdev - RE: TCP delayed ACK heuristic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <AE90C24D6B3A694183C094C60CF0A2F6026B70F7@saturn3.aculab.com>
Date:	Wed, 19 Dec 2012 09:52:54 -0000
From:	"David Laight" <David.Laight@...LAB.COM>
To:	"Rick Jones" <rick.jones2@...com>
Cc:	"Cong Wang" <amwang@...hat.com>, <netdev@...r.kernel.org>,
	"Ben Greear" <greearb@...delatech.com>,
	"David Miller" <davem@...emloft.net>,
	"Eric Dumazet" <eric.dumazet@...il.com>,
	"Stephen Hemminger" <shemminger@...tta.com>,
	"Thomas Graf" <tgraf@...hat.com>
Subject: RE: TCP delayed ACK heuristic

> > I've seen problems when the sending side is doing (I think)
> > 'slow start' with Nagle disabled.
> > The sender would only send 4 segments before waiting for an
> > ACK - even when it had more than a full sized segment waiting.
> > Sender was Linux 2.6.something (probably low 20s).
> > I changed the application flow to send data in the reverse
> > direction to avoid the problem.
> > That was on a ~0 delay local connection - which means that
> > there is almost never outstanding data, and the 'slow start'
> > happened almost all the time.
> > Nagle is completely the wrong algorithm for the data flow.
> 
> If Nagle was already disabled, why the last sentence?  And from your
> description, even if Nagle were enabled, I would think that it was
> remote ACK+cwnd behaviour getting in your way, not Nagle, given that
> Nagle is to be decided on a user-send by user-send basis and release
> queued data (to the mercies of other heuristics) when it gets to be an
> MSS-worth.

With Nagle enabled the first segment is sent, the following ones
get buffered until full segments can be sent. Although (probably)
only 4 segments will be sent (1 small and 3 full) the 3rd of these
does generate an ack.

> ... but it sounds like you have an actual
> application looking to do that??

We are relaying data packets received over multiple SS7 signalling
links (64k hdlc) over a TCP connection. The connection will be local,
in some cases the host ethernet MAC, switch, and target cpu are all
on the same PCI(e) card (MII crossover links).
While a delay of a millisecond or two wouldn't matter (1ms is 8 byte
times) the Nagle delay is far too long - and since the data isn't
command/response the Nagle would delay happen a lot.

One of the conformance tests managed to make the system 'busy'.
Since all it does is make one 64k channel busy it shouldn't have
been able to generate a backlog of receive data - but it managed to
get over 100 data packets unacked (app level ack).

> Allowing a byte-limit-cwnd's worth of single-byte-payload TCP segments
> could easily be seen as being rather anti-social :)

If the actual RTT is almost zero (as in our case) and the network
really shouldn't be dropping packets the it doesn't matter.

I suspect that if the tx rate is faster than the RTT then the
'slow start' turns off and you can get a lot of small segments
in flight. But when the RTT is zero 'slow start' almost always
applies and you only send 4.

> And forcing/maintaining the original segment boundaries in
> retransmissions for small packets isn't such a hot idea either.

True, not splitting them might be useful, but to need to avoid
merges.

	David