netdev - TCP one-by-one acking - RFC interpretation question

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Date:   Fri, 6 Apr 2018 12:05:11 +0200
From:   Michal Kubecek <mkubecek@...e.cz>
To:     netdev@...r.kernel.org
Subject: TCP one-by-one acking - RFC interpretation question

Hello,

I encountered a strange behaviour of some (non-linux) TCP stack which
I believe is incorrect but support engineers from the company producing
it claim is OK.

Assume a client (sender, Linux 4.4 kernel) sends a stream of MSS sized
segments but segments 2, 4 and 6 do not reach the server (receiver):

         ACK             SAK             SAK             SAK
      +-------+-------+-------+-------+-------+-------+-------+
      |   1   |   2   |   3   |   4   |   5   |   6   |   7   |
      +-------+-------+-------+-------+-------+-------+-------+
    34273   35701   37129   38557   39985   41413   42841   44269

When segment 2 is retransmitted after RTO timeout, normal response would
be ACK-ing segment 3 (38557) with SACK for 5 and 7 (39985-41413 and
42841-44269).

However, this server stack responds with two separate ACKs:

  - ACK 37129, SACK 37129-38557 39985-41413 42841-44269
  - ACK 38557, SACK 39985-41413 42841-44269

There is no payload from server, no window update and it happens even if
there is no other packet received by server between those two. The
result is that as segment 3 was never retransmitted, second ACK is
interpreted as acking a newly arrived segment by 4.4 kernel so that the
whole interval between first transmission of segment 3 and this second
ACK is used for RTT estimator; even worse, when the same happens again
for segment 5, both timeouts (for 2 and 4) are counted into its RTT.
The result is RTO growing exponentially until it reaches the maximum
(120 seconds) and the connection is effectively stalled.

In my opinion, server behaviour violates the last paragraph of RFC 5681,
section 4.2:

  A TCP receiver MUST NOT generate more than one ACK for every incoming
  segment, other than to update the offered window as the receiving
  application consumes new data (see [RFC813] and page 42 of [RFC793]).

Server vendor claims that their behaviour is correct as first ACK is
sent in response to segment 2 and second ACK in response to segment 3
(which has just been delayed in the out of order queue).

Note that SACK doesn't really help here. First SACK block in first ACK
(37129-38557) is actually invalid as it violates the "the bytes just
below the block ... have not been received" condition from RFC 2018
section 3. Therefore Linux 4.4 stack ignores this SACK block, detects
(spurious) SACK reneging and unmarks the "previously sacked" flag of
segment 3 so that when second ACK arrives, there is no trace of it
having been sacked before. They already admitted this SACK block is
incorrect but there is still disagreement about the "one-by-one acking"
behaviour in general.

My question is: is my interpretation correct? If so, is there an even
less ambiguous statement somewhere that receiver is supposed to send one
ACK for "everything they got so far" rather than acking the segments one
by one? While reading the RFCs, I always considered this obvious but
apparently some people may think otherwise.

Thanks in advance,
Michal Kubecek