netdev - Re: [PATCH v6 net-next 0/2] tcp: Redundant Data Bundling (RDB)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK6E8=e5tVX3cfwvaXxN7LEq6XggeT5Y7T0j4KcKRsA1b-77mQ@mail.gmail.com>
Date:	Mon, 14 Mar 2016 14:59:07 -0700
From:	Yuchung Cheng <ycheng@...gle.com>
To:	Bendik Rønning Opstad <bro.devel@...il.com>
Cc:	"David S. Miller" <davem@...emloft.net>,
	netdev <netdev@...r.kernel.org>,
	Eric Dumazet <eric.dumazet@...il.com>,
	Neal Cardwell <ncardwell@...gle.com>,
	Andreas Petlund <apetlund@...ula.no>,
	Carsten Griwodz <griff@...ula.no>,
	Pål Halvorsen <paalh@...ula.no>,
	Jonas Markussen <jonassm@....uio.no>,
	Kristian Evensen <kristian.evensen@...il.com>,
	Kenneth Klette Jonassen <kennetkl@....uio.no>
Subject: Re: [PATCH v6 net-next 0/2] tcp: Redundant Data Bundling (RDB)

On Sun, Mar 13, 2016 at 4:18 PM, Bendik Rønning Opstad
<bro.devel@...il.com> wrote:
> On 03/10/2016 01:20 AM, Yuchung Cheng wrote:
>> I read the paper. I think the underlying idea is neat. but the
>> implementation is little heavy-weight that requires changes on fast
>> path (tcp_write_xmit) and space in skb control blocks.
>
> Yuchung, thank you for taking the time to review the patch submission
> and read the paper.
>
> I must admit I was not particularly happy about the extra if-test on the
> fast path, and I fully understand the wish to keep the fast path as
> simple and clean as possible.
> However, is the performance hit that significant considering the branch
> prediction hint for the non-RDB path?
>
> The extra variable needed in the SKB CB does not require increasing the
> CB buffer size due to the "tcp: refactor struct tcp_skb_cb" patch:
> http://patchwork.ozlabs.org/patch/510674 and uses only some of the space
> made available in the outgoing SKBs' CB. Therefore I hoped the extra
> variable would be acceptable.
>
>> ultimately this
>> patch is meant for a small set of specific applications.
>
> Yes, the RDB mechanism is aimed at a limited set of applications,
> specifically time-dependent applications that produce non-greedy,
> application limited (thin) flows. However, our hope is that RDB may
> greatly improve TCP's position as a viable alternative for applications
> transmitting latency sensitive data.
>
>> In my mental model (please correct me if I am wrong), losses on these
>> thin streams would mostly resort to RTOs instead of fast recovery, due
>> to the bursty nature of Internet losses.
>
> This depends on the transmission pattern of the applications, which
> varies to a great deal, also between the different types of
> time-dependent applications that produce thin streams. For short flows,
> (bursty) loss at the end will result in an RTO (if TLP does not probe),
> but the thin streams are often long lived, and the applications
> producing them continue to write small data segments to the socket at
> intervals of tens to hundreds of milliseconds.
>
> What controls if an RTO and not fast retransmit will resend the packet,
> is the number of PIFs, which directly correlates to how often the
> application writes data to the socket in relation to the RTT. As long as
> the number of packets successfully completing a round trip before the
> RTO is >= the dupACK threshold, they will not depend on RTOs (not
> considering TLP). Early retransmit and the TCP_THIN_DUPACK socket option
> will also affect the likelihood of RTOs vs fast retransmits.
>
>> The HOLB comes from RTO only
>> retransmit the first (tiny) unacked packet while a small of new data is
>> readily available. But since Linux congestion control is packet-based,
>> and loss cwnd is 1, the new data needs to wait until the 1st packet is
>> acked which is for another RTT.
>
> If I understand you correctly, you are referring to HOLB on the sender
> side, which is the extra delay on new data that is held back when the
> connection is CWND-limited. In the paper, we refer to this extra delay
> as increased sojourn times for the outgoing data segments.
>
> We do not include this additional sojourn time for the segments on the
> sender side in the ACK Latency plots (Fig. 4 in the paper). This is
> simply because the pcap traces contain the timestamps when the packets
> are sent, and not when the segments are added to the output queue.
>
> When we refer to the HOLB effect in the paper as well as the thesis, we
> refer to the extra delays (sojourn times) on the receiver side where
> segments are held back (not made available to user space) due to gaps in
> the sequence range when packets are lost (we had no reordering).
>
> So, when considering the increased delays due to HOLB on the receiver
> side, HOLB is not at all limited to RTOs. Actually, it's mostly not due
> to RTOs in the tests we've run, however, this also depends very much on
> the transmission pattern of the application as well as loss levels.
> In general, HOLB on the receiver side will affect any flow that
> transmits a packet with new data after a packet is lost (sender may not
> know yet), where the lost packet has not already been retransmitted.
OK that makes sense.

I left some detailed comments on the actual patches. I would encourage
to submit an IETF draft to gather feedback from tcpm b/c the feature
seems portable.

>
> Consider a sender application that performs write calls every 30 ms on a
> 150 ms RTT link. It will need a CWND that allows 5-6 PIFs to be able to
> transmit all new data segments with no extra sojourn times on the sender
> side.
> When one packet is lost, the next 5 packets that are sent will be held
> back on the receiver side due to the missing segment (HOLB). In the best
> case scenario, the first dupACK triggers a fast retransmit around the
> same time as the fifth packet (after the lost packet) is sent. In that
> case, the first segment sent after the lost segment is held back on the
> receiver for 150 ms (the time it takes for the dupACK to reach the
> sender, and the fast retrans to arrive at the receiver). The second is
> held back 120 ms, the third 90 ms, the fourth 60 ms, an the fifth 30 ms.
>
> All of this extra delay is added before the sender even knows there was
> a loss. How it decides to react to the loss signal (dupACKs) will
> further decide how much extra delays will be added in addition to the
> delays already inflicted on the segments by the HOLB.
>
>> Instead what if we only perform RDB on the (first and recurring) RTO
>> retransmission?
>
> That will change RDB from being a proactive mechanism, to being
> reactive, i.e. change how the sender responds to the loss signal. The
> problem is that by this point (when the sender has received the loss
> signal), the HOLB on the receiver side has already caused significant
> increases to the application layer latency.
>
> The reason the RDB streams (in red) in fig. 4 in the paper get such low
> latencies is because there are almost no retransmissions. With 10%
> uniform loss, the latency for 90% of the packets is not affected at all.
> The latency for most of the lost segments is only increased by 30 ms,
> which is when the next RDB packet arrives at the receiver with the lost
> segment bundled in the payload.
> For the regular TCP streams (blue), the latency for 40% of the segments
> is affected, where almost 30% of the segments have additional delays of
> 150 ms or more.
> It is important to note that the increases to the latencies for the
> regular TCP streams compared to the RDB streams are solely due to HOLB
> on the receiver side.
>
> The longer the RTT, the greater the gains are by using RDB, considering
> the best case scenario of minimum one RTT required for a retransmission.
> As such, RDB will reduce the latencies the most for those that also need
> it the most.
>
> However, even with an RTT of 20 ms, an application writing a data
> segment every 10 ms will still get significant latency reductions simply
> because a retransmission will require a minimum of 20 ms, compared to
> the 10 ms it takes for the next RDB packet to arrive at the receiver.
>
>
> Bendik