netdev - tcp: picking a less conservative SACK RTT for congestion control

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA++eYdt8YT+xp88N7KFX3OMoHO3sVS6yfnuPDHLrWpM2w3NzRw@mail.gmail.com>
Date:	Sat, 11 Apr 2015 21:50:15 +0200
From:	Kenneth Klette Jonassen <kennetkl@....uio.no>
To:	netdev@...r.kernel.org
Cc:	Eric Dumazet <edumazet@...gle.com>,
	Neal Cardwell <ncardwell@...gle.com>,
	Yuchung Cheng <ycheng@...gle.com>,
	Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>,
	Stephen Hemminger <stephen@...workplumber.org>
Subject: tcp: picking a less conservative SACK RTT for congestion control

tcp_sacktag_one() currently picks the earliest sequence sacked for RTT. This
makes sense when data is sacked due to reordering as described in commit
832d11c5 ("Try to restore large SKBs while SACK processing"). But it might
not make sense for CC in cases where:

 1. ACKs are lost, i.e. a SACK subsequent to a lost SACK covers both a new
    and an old segment at the receiver. A concrete example follows below.
 2. The receiver disregards the rfc5681 recommendation to immediately ack
    out-of-order segments, perhaps due to a hardware offload mechanism.

We have an implementation of the experimental congestion controller CDG [1]
which can perform slightly better in environments with random loss. Unlike
e.g. Vegas which resets all internal state when loss is detected, CDG is
quite sensitive to recent RTT changes even during loss recovery.

What would be the feasible approach to track the last segment sacked? I was
thinking of keeping first/last skb_mstamp's in struct tcp_sacktag_state akin
to the way it is done in tcp_clean_rtx_queue(). This would require passing
eight more bytes around on 64 bit. An alternative that is slightly obscure
is to store the delta between the first and last sack in a 4 byte value.
Since struct tcp_sacktag_state currently has 4 bytes padding, this does not
require passing more data around -- just changing "long sack_rtt_us" to
a pointer. It can have some microscale cache locality impacts though. I
envision that both approaches saves the call to skb_mstamp_get() in
tcp_sacktag_one().

1. http://caia.swin.edu.au/cv/dahayes/content/networking2011-cdg-preprint.pdf

PS: The pkts_acked CC hook is not currently called unless new data is acked
sequentially. I have a simple patch that calls it for new SACK RTTs, but I
am holding it off until my recent patch is reviewed (fix bogus RTT for CC).

---

Concrete example. Path has 1% uniform loss, no reordering. Prints show delta
timestamped packets separately captured at sender and receiver.

Receiver sends two acks:
00:00:00.005018 IP 10.0.1.2.5001 > 10.0.0.2.48089: Flags [.], ack
3824632751, win 32746, options [nop,nop,TS val 1820536519 ecr
2169294,nop,nop,sack 1 {3824634199:3824651575}], length 0
00:00:00.004871 IP 10.0.1.2.5001 > 10.0.0.2.48089: Flags [.], ack
3824632751, win 32746, options [nop,nop,TS val 1820536524 ecr
2169294,nop,nop,sack 1 {3824634199:3824653023}], length 0

One reaches the sender:
00:00:00.009842 IP 10.0.1.2.5001 > 10.0.0.2.48089: Flags [.], ack
3824632751, win 32746, options [nop,nop,TS val 1820536524 ecr
2169294,nop,nop,sack 1 {3824634199:3824653023}], length 0

Trace output at sender:
8968.105153: tcp_sacktag_one: first sacked range 3824648679 -
3824651575 rtt 75129
8968.105157: tcp_sacktag_one: later sacked range 3824651575 -
3824653023 rtt 70224 (rtt not used)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html