[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADVnQyn4faThmR+_cPDseX3Q_U8U8340D9a2vyW5SmkYC1n2MQ@mail.gmail.com>
Date: Tue, 3 Jul 2018 11:25:57 -0400
From: Neal Cardwell <ncardwell@...gle.com>
To: Lawrence Brakmo <brakmo@...com>
Cc: Netdev <netdev@...r.kernel.org>, Kernel Team <Kernel-team@...com>,
bmatheny@...com, ast@...com, Yuchung Cheng <ycheng@...gle.com>,
Steve Ibanez <sibanez@...nford.edu>,
Eric Dumazet <eric.dumazet@...il.com>
Subject: Re: [PATCH net-next v2 0/2] tcp: fix high tail latencies in DCTCP
On Tue, Jul 3, 2018 at 11:10 AM Lawrence Brakmo <brakmo@...com> wrote:
>
> On 7/2/18, 5:52 PM, "netdev-owner@...r.kernel.org on behalf of Neal Cardwell" <netdev-owner@...r.kernel.org on behalf of ncardwell@...gle.com> wrote:
>
> On Mon, Jul 2, 2018 at 5:39 PM Lawrence Brakmo <brakmo@...com> wrote:
> >
> > When have observed high tail latencies when using DCTCP for RPCs as
> > compared to using Cubic. For example, in one setup there are 2 hosts
> > sending to a 3rd one, with each sender having 3 flows (1 stream,
> > 1 1MB back-to-back RPCs and 1 10KB back-to-back RPCs). The following
> > table shows the 99% and 99.9% latencies for both Cubic and dctcp:
> >
> > Cubic 99% Cubic 99.9% dctcp 99% dctcp 99.9%
> > 1MB RPCs 2.6ms 5.5ms 43ms 208ms
> > 10KB RPCs 1.1ms 1.3ms 53ms 212ms
> >
> > Looking at tcpdump traces showed that there are two causes for the
> > latency.
> >
> > 1) RTOs caused by the receiver sending a dup ACK and not ACKing
> > the last (and only) packet sent.
> > 2) Delaying ACKs when the sender has a cwnd of 1, so everything
> > pauses for the duration of the delayed ACK.
> >
> > The first patch fixes the cause of the dup ACKs, not updating DCTCP
> > state when an ACK that was initially delayed has been sent with a
> > data packet.
> >
> > The second patch insures that an ACK is sent immediately when a
> > CWR marked packet arrives.
> >
> > With the patches the latencies for DCTCP now look like:
> >
> > dctcp 99% dctcp 99.9%
> > 1MB RPCs 5.8ms 6.9ms
> > 10KB RPCs 146us 203us
> >
> > Note that while the 1MB RPCs tail latencies are higher than Cubic's,
> > the 10KB latencies are much smaller than Cubic's. These patches fix
> > issues on the receiver, but tcpdump traces indicate there is an
> > opportunity to also fix an issue at the sender that adds about 3ms
> > to the tail latencies.
> >
> > The following trace shows the issue that tiggers an RTO (fixed by these patches):
> >
> > Host A sends the last packets of the request
> > Host B receives them, and the last packet is marked with congestion (CE)
> > Host B sends ACKs for packets not marked with congestion
> > Host B sends data packet with reply and ACK for packet marked with
> > congestion (TCP flag ECE)
> > Host A receives ACKs with no ECE flag
> > Host A receives data packet with ACK for the last packet of request
> > and which has TCP ECE bit set
> > Host A sends 1st data packet of the next request with TCP flag CWR
> > Host B receives the packet (as seen in tcpdump at B), no CE flag
> > Host B sends a dup ACK that also has the TCP ECE flag
> > Host A RTO timer fires!
> > Host A to send the next packet
> > Host A receives an ACK for everything it has sent (i.e. Host B
> > did receive 1st packet of request)
> > Host A send more packets…
> >
> > [PATCH net-next v2 1/2] tcp: notify when a delayed ack is sent
> > [PATCH net-next v2 2/2] tcp: ack immediately when a cwr packet
> >
> > net/ipv4/tcp_input.c | 16 +++++++++++-----
> > net/ipv4/tcp_output.c | 4 ++--
> > 2 files changed, 13 insertions(+), 7 deletions(-)
>
> Thanks, Larry. Just for context, can you please let us know whether
> your tests included zero, one, or both of Eric's recent commits
> (listed below) that tuned the number of ACKs after ECN events? (Or
> maybe the tests were literally using a net-next kernel?) Just wanted
> to get a better handle on any possible interactions there.
>
> Thanks!
>
> neal
>
> Yes, my test kernel includes both patches listed below.
OK, great.
> BTW, I will send a new
> patch where I move the call to tcp_incr_quickack from tcp_ecn_check_ce to tcp_ecn_accept_cwr.
OK, sounds good to me.
thanks,
neal
Powered by blists - more mailing lists