[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <DAC7EDC0-D829-4107-AE04-4850118F571D@fb.com>
Date: Tue, 3 Jul 2018 15:10:13 +0000
From: Lawrence Brakmo <brakmo@...com>
To: Neal Cardwell <ncardwell@...gle.com>
CC: Netdev <netdev@...r.kernel.org>, Kernel Team <Kernel-team@...com>,
"Blake Matheny" <bmatheny@...com>, Alexei Starovoitov <ast@...com>,
Yuchung Cheng <ycheng@...gle.com>,
Steve Ibanez <sibanez@...nford.edu>,
Eric Dumazet <eric.dumazet@...il.com>
Subject: Re: [PATCH net-next v2 0/2] tcp: fix high tail latencies in DCTCP
On 7/2/18, 5:52 PM, "netdev-owner@...r.kernel.org on behalf of Neal Cardwell" <netdev-owner@...r.kernel.org on behalf of ncardwell@...gle.com> wrote:
On Mon, Jul 2, 2018 at 5:39 PM Lawrence Brakmo <brakmo@...com> wrote:
>
> When have observed high tail latencies when using DCTCP for RPCs as
> compared to using Cubic. For example, in one setup there are 2 hosts
> sending to a 3rd one, with each sender having 3 flows (1 stream,
> 1 1MB back-to-back RPCs and 1 10KB back-to-back RPCs). The following
> table shows the 99% and 99.9% latencies for both Cubic and dctcp:
>
> Cubic 99% Cubic 99.9% dctcp 99% dctcp 99.9%
> 1MB RPCs 2.6ms 5.5ms 43ms 208ms
> 10KB RPCs 1.1ms 1.3ms 53ms 212ms
>
> Looking at tcpdump traces showed that there are two causes for the
> latency.
>
> 1) RTOs caused by the receiver sending a dup ACK and not ACKing
> the last (and only) packet sent.
> 2) Delaying ACKs when the sender has a cwnd of 1, so everything
> pauses for the duration of the delayed ACK.
>
> The first patch fixes the cause of the dup ACKs, not updating DCTCP
> state when an ACK that was initially delayed has been sent with a
> data packet.
>
> The second patch insures that an ACK is sent immediately when a
> CWR marked packet arrives.
>
> With the patches the latencies for DCTCP now look like:
>
> dctcp 99% dctcp 99.9%
> 1MB RPCs 5.8ms 6.9ms
> 10KB RPCs 146us 203us
>
> Note that while the 1MB RPCs tail latencies are higher than Cubic's,
> the 10KB latencies are much smaller than Cubic's. These patches fix
> issues on the receiver, but tcpdump traces indicate there is an
> opportunity to also fix an issue at the sender that adds about 3ms
> to the tail latencies.
>
> The following trace shows the issue that tiggers an RTO (fixed by these patches):
>
> Host A sends the last packets of the request
> Host B receives them, and the last packet is marked with congestion (CE)
> Host B sends ACKs for packets not marked with congestion
> Host B sends data packet with reply and ACK for packet marked with
> congestion (TCP flag ECE)
> Host A receives ACKs with no ECE flag
> Host A receives data packet with ACK for the last packet of request
> and which has TCP ECE bit set
> Host A sends 1st data packet of the next request with TCP flag CWR
> Host B receives the packet (as seen in tcpdump at B), no CE flag
> Host B sends a dup ACK that also has the TCP ECE flag
> Host A RTO timer fires!
> Host A to send the next packet
> Host A receives an ACK for everything it has sent (i.e. Host B
> did receive 1st packet of request)
> Host A send more packets…
>
> [PATCH net-next v2 1/2] tcp: notify when a delayed ack is sent
> [PATCH net-next v2 2/2] tcp: ack immediately when a cwr packet
>
> net/ipv4/tcp_input.c | 16 +++++++++++-----
> net/ipv4/tcp_output.c | 4 ++--
> 2 files changed, 13 insertions(+), 7 deletions(-)
Thanks, Larry. Just for context, can you please let us know whether
your tests included zero, one, or both of Eric's recent commits
(listed below) that tuned the number of ACKs after ECN events? (Or
maybe the tests were literally using a net-next kernel?) Just wanted
to get a better handle on any possible interactions there.
Thanks!
neal
Yes, my test kernel includes both patches listed below. BTW, I will send a new patch where I move the call to tcp_incr_quickack from tcp_ecn_check_ce to tcp_ecn_accept_cwr.
---
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=522040ea5fdd1c33bbf75e1d7c7c0422b96a94ef
commit 522040ea5fdd1c33bbf75e1d7c7c0422b96a94ef
Author: Eric Dumazet <edumazet@...gle.com>
Date: Mon May 21 15:08:57 2018 -0700
tcp: do not aggressively quick ack after ECN events
ECN signals currently forces TCP to enter quickack mode for
up to 16 (TCP_MAX_QUICKACKS) following incoming packets.
We believe this is not needed, and only sending one immediate ack
for the current packet should be enough.
This should reduce the extra load noticed in DCTCP environments,
after congestion events.
This is part 2 of our effort to reduce pure ACK packets.
Signed-off-by: Eric Dumazet <edumazet@...gle.com>
Acked-by: Soheil Hassas Yeganeh <soheil@...gle.com>
Acked-by: Yuchung Cheng <ycheng@...gle.com>
Acked-by: Neal Cardwell <ncardwell@...gle.com>
Signed-off-by: David S. Miller <davem@...emloft.net>
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=15ecbe94a45ef88491ca459b26efdd02f91edb6d
commit 15ecbe94a45ef88491ca459b26efdd02f91edb6d
Author: Eric Dumazet <edumazet@...gle.com>
Date: Wed Jun 27 08:47:21 2018 -0700
tcp: add one more quick ack after after ECN events
Larry Brakmo proposal ( https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_patch_935233_&d=DwIFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=pq_Mqvzfy-C8ltkgyx1u_g&m=-aaZOqM6jDpeeyDiKlVoiaQRFc55aJgk4jHQILSj3D4&s=tFz97077b4KhTvYd69-n-YhnWhX6PWZQceWJiylvL5Q&e=
tcp: force cwnd at least 2 in tcp_cwnd_reduction) made us rethink
about our recent patch removing ~16 quick acks after ECN events.
tcp_enter_quickack_mode(sk, 1) makes sure one immediate ack is sent,
but in the case the sender cwnd was lowered to 1, we do not want
to have a delayed ack for the next packet we will receive.
Fixes: 522040ea5fdd ("tcp: do not aggressively quick ack after ECN events")
Signed-off-by: Eric Dumazet <edumazet@...gle.com>
Reported-by: Neal Cardwell <ncardwell@...gle.com>
Cc: Lawrence Brakmo <brakmo@...com>
Acked-by: Neal Cardwell <ncardwell@...gle.com>
Signed-off-by: David S. Miller <davem@...emloft.net>
Powered by blists - more mailing lists