netdev - Re: [PATCH net-next v2 0/2] tcp: fix high tail latencies in DCTCP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CADVnQyn4faThmR+_cPDseX3Q_U8U8340D9a2vyW5SmkYC1n2MQ@mail.gmail.com>
Date:   Tue, 3 Jul 2018 11:25:57 -0400
From:   Neal Cardwell <ncardwell@...gle.com>
To:     Lawrence Brakmo <brakmo@...com>
Cc:     Netdev <netdev@...r.kernel.org>, Kernel Team <Kernel-team@...com>,
        bmatheny@...com, ast@...com, Yuchung Cheng <ycheng@...gle.com>,
        Steve Ibanez <sibanez@...nford.edu>,
        Eric Dumazet <eric.dumazet@...il.com>
Subject: Re: [PATCH net-next v2 0/2] tcp: fix high tail latencies in DCTCP

On Tue, Jul 3, 2018 at 11:10 AM Lawrence Brakmo <brakmo@...com> wrote:
>
> On 7/2/18, 5:52 PM, "netdev-owner@...r.kernel.org on behalf of Neal Cardwell" <netdev-owner@...r.kernel.org on behalf of ncardwell@...gle.com> wrote:
>
>     On Mon, Jul 2, 2018 at 5:39 PM Lawrence Brakmo <brakmo@...com> wrote:
>     >
>     > When have observed high tail latencies when using DCTCP for RPCs as
>     > compared to using Cubic. For example, in one setup there are 2 hosts
>     > sending to a 3rd one, with each sender having 3 flows (1 stream,
>     > 1 1MB back-to-back RPCs and 1 10KB back-to-back RPCs). The following
>     > table shows the 99% and 99.9% latencies for both Cubic and dctcp:
>     >
>     >            Cubic 99%  Cubic 99.9%   dctcp 99%    dctcp 99.9%
>     >  1MB RPCs    2.6ms       5.5ms         43ms          208ms
>     > 10KB RPCs    1.1ms       1.3ms         53ms          212ms
>     >
>     > Looking at tcpdump traces showed that there are two causes for the
>     > latency.
>     >
>     >   1) RTOs caused by the receiver sending a dup ACK and not ACKing
>     >      the last (and only) packet sent.
>     >   2) Delaying ACKs when the sender has a cwnd of 1, so everything
>     >      pauses for the duration of the delayed ACK.
>     >
>     > The first patch fixes the cause of the dup ACKs, not updating DCTCP
>     > state when an ACK that was initially delayed has been sent with a
>     > data packet.
>     >
>     > The second patch insures that an ACK is sent immediately when a
>     > CWR marked packet arrives.
>     >
>     > With the patches the latencies for DCTCP now look like:
>     >
>     >            dctcp 99%  dctcp 99.9%
>     >  1MB RPCs    5.8ms       6.9ms
>     > 10KB RPCs    146us       203us
>     >
>     > Note that while the 1MB RPCs tail latencies are higher than Cubic's,
>     > the 10KB latencies are much smaller than Cubic's. These patches fix
>     > issues on the receiver, but tcpdump traces indicate there is an
>     > opportunity to also fix an issue at the sender that adds about 3ms
>     > to the tail latencies.
>     >
>     > The following trace shows the issue that tiggers an RTO (fixed by these patches):
>     >
>     >    Host A sends the last packets of the request
>     >    Host B receives them, and the last packet is marked with congestion (CE)
>     >    Host B sends ACKs for packets not marked with congestion
>     >    Host B sends data packet with reply and ACK for packet marked with
>     >           congestion (TCP flag ECE)
>     >    Host A receives ACKs with no ECE flag
>     >    Host A receives data packet with ACK for the last packet of request
>     >           and which has TCP ECE bit set
>     >    Host A sends 1st data packet of the next request with TCP flag CWR
>     >    Host B receives the packet (as seen in tcpdump at B), no CE flag
>     >    Host B sends a dup ACK that also has the TCP ECE flag
>     >    Host A RTO timer fires!
>     >    Host A to send the next packet
>     >    Host A receives an ACK for everything it has sent (i.e. Host B
>     >           did receive 1st packet of request)
>     >    Host A send more packets…
>     >
>     > [PATCH net-next v2 1/2] tcp: notify when a delayed ack is sent
>     > [PATCH net-next v2 2/2] tcp: ack immediately when a cwr packet
>     >
>     >  net/ipv4/tcp_input.c  | 16 +++++++++++-----
>     >  net/ipv4/tcp_output.c |  4 ++--
>     >  2 files changed, 13 insertions(+), 7 deletions(-)
>
>     Thanks, Larry. Just for context, can you please let us know whether
>     your tests included zero, one, or both of Eric's recent commits
>     (listed below) that tuned the number of ACKs after ECN events? (Or
>     maybe the tests were literally using a net-next kernel?) Just wanted
>     to get a better handle on any possible interactions there.
>
>     Thanks!
>
>     neal
>
> Yes, my test kernel includes both patches listed below.

OK, great.

> BTW, I will send a new
> patch where I move the call to tcp_incr_quickack from tcp_ecn_check_ce to tcp_ecn_accept_cwr.

OK, sounds good to me.

thanks,
neal