[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAOJe8K0aD97V6+Sa6Mdx4CfmuCc1UNFV1n8VLFjvOoU_7Xxm9g@mail.gmail.com>
Date: Wed, 24 Jun 2020 11:37:58 +0300
From: Denis Kirjanov <kda@...ux-powerpc.org>
To: Neal Cardwell <ncardwell@...gle.com>
Cc: Netdev <netdev@...r.kernel.org>,
Eric Dumazet <edumazet@...gle.com>,
Yuchung Cheng <ycheng@...gle.com>,
"Scheffenegger, Richard" <Richard.Scheffenegger@...app.com>,
Bob Briscoe <ietf@...briscoe.net>
Subject: Re: [PATCH] tcp: don't ignore ECN CWR on pure ACK
On 6/23/20, Neal Cardwell <ncardwell@...gle.com> wrote:
> On Tue, Jun 23, 2020 at 10:54 AM Denis Kirjanov <kda@...ux-powerpc.org>
> wrote:
>>
>> there is a problem with the CWR flag set in an incoming ACK segment
>> and it leads to the situation when the ECE flag is latched forever
>>
>> the following packetdrill script shows what happens:
>>
>> // Stack receives incoming segments with CE set
>> +0.1 <[ect0] . 11001:12001(1000) ack 1001 win 65535
>> +0.0 <[ce] . 12001:13001(1000) ack 1001 win 65535
>> +0.0 <[ect0] P. 13001:14001(1000) ack 1001 win 65535
>>
>> // Stack repsonds with ECN ECHO
>> +0.0 >[noecn] . 1001:1001(0) ack 12001
>> +0.0 >[noecn] E. 1001:1001(0) ack 13001
>> +0.0 >[noecn] E. 1001:1001(0) ack 14001
>>
>> // Write a packet
>> +0.1 write(3, ..., 1000) = 1000
>> +0.0 >[ect0] PE. 1001:2001(1000) ack 14001
>>
>> // Pure ACK received
>> +0.01 <[noecn] W. 14001:14001(0) ack 2001 win 65535
>>
>> // Since CWR was sent, this packet should NOT have ECE set
>>
>> +0.1 write(3, ..., 1000) = 1000
>> +0.0 >[ect0] P. 2001:3001(1000) ack 14001
>> // but Linux will still keep ECE latched here, with packetdrill
>> // flagging a missing ECE flag, expecting
>> // >[ect0] PE. 2001:3001(1000) ack 14001
>> // in the script
>>
>> In the situation above we will continue to send ECN ECHO packets
>> and trigger the peer to reduce the congestion window.
>>
>> Signed-off-by: Denis Kirjanov <denis.kirjanov@...e.com>
>> ---
>> net/ipv4/tcp_input.c | 9 ++++++---
>> 1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
>> index 12fda8f27b08..e00b88c8598d 100644
>> --- a/net/ipv4/tcp_input.c
>> +++ b/net/ipv4/tcp_input.c
>> @@ -3696,8 +3696,13 @@ static int tcp_ack(struct sock *sk, const struct
>> sk_buff *skb, int flag)
>> &rexmit);
>> }
>>
>> - if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP))
>> + if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP)) {
>> + /* If we miss the CWR flag then ECE will be
>> + * latched forever.
>> + */
>> + tcp_ecn_accept_cwr(sk, skb);
>> sk_dst_confirm(sk);
>> + }
>>
>> delivered = tcp_newly_delivered(sk, delivered, flag);
>> lost = tp->lost - lost; /* freshly marked lost */
>> @@ -4800,8 +4805,6 @@ static void tcp_data_queue(struct sock *sk, struct
>> sk_buff *skb)
>> skb_dst_drop(skb);
>> __skb_pull(skb, tcp_hdr(skb)->doff * 4);
>>
>> - tcp_ecn_accept_cwr(sk, skb);
>> -
>> tp->rx_opt.dsack = 0;
>>
>> /* Queue data for delivery to the user.
>> --
>> 2.27.0
>>
>
> Hi Denis -
>
> Thanks for this patch!
Hi Neal,
> A few thoughts:
>
> (1) Richard Scheffenegger (cc-ed) raised this issue in January.
> IMHO The Linux TCP code is RFC-compliant here. If you have a
> sender that is sending CWR on pure ACKs, then that sender is
> violating RFC3168, which specifies that CWR should only sent on
> data segments, so that the sender can be sure that there is a
> cwnd reduction even if a packet with CWR set is lost:
>
> RFC3168 says:
>
> """
> When an ECN-Capable TCP sender reduces its congestion window for
> any reason (because of a retransmit timeout, a Fast Retransmit,
> or in response to an ECN Notification), the TCP sender sets the
> CWR flag in the TCP header of the first new data packet sent
> after the window reduction. If that data packet is dropped in
> the network, then the
> ...
> sending TCP will have to reduce the congestion window again and
> retransmit the dropped packet.
>
> We ensure that the "Congestion Window Reduced" information is
> reliably delivered to the TCP receiver. This comes about from
> the fact that if the new data packet carrying the CWR flag is
> dropped, then the TCP sender will have to again reduce its
> congestion window, and send another new data packet with the CWR
> flag set. Thus, the CWR bit in the TCP header SHOULD NOT be set
> on retransmitted packets.
>
> When the TCP data sender is ready to set the CWR bit after
> reducing the congestion window, it SHOULD set the CWR bit only on
> the first new data packet that it transmits.
> """
>
> (2) Even given (1), I would agree with Richard's point that TCP
> receivers should accept CWR on pure ACKs, per the rationale of
> Postel's robustness principle ("Be liberal in what you accept,
> and conservative in what you send.") Here we should be
> liberal and accept the CWR in the pure ACK, particularly
> because we know that at least one class of widely-deployed TCP
> stacks (*BSD stacks) do this.
>
> (3) Given (2), I don't think this is the proper place to accept a
> CWR. That point in the code is not reached by a connection
> that is only currently receiving data, because of the lines
> above that say:
>
> if (!prior_packets)
> goto no_queue;
Right, here we don't loose a segment
>
> Accepting CWR is something a data receiver does (to respond to
> a data sender that is signalling that it has slowed down). So
> even though I agree we should make the code more
> liberal/robust in acepting CWR on pure ACK, I don't think this
> is the right spot to insert the code.
>
> (4) I would say this is an interoperability bug fix, so this
> should be explicitly targetd to the "net" branch.
Sure
>
> (5) If you are able and have time, it would be nice to post the
> full packetdrill script, either in the commit message, or as a
> pull request to the packetdrill github project, so that the
> Linux community may benefit from the full test case.
Initially I thought that we do have packetdrill scripts in the kernel
but we don't.
Sure, I'll post a patch. Actually I've sent a different patch before
(regarding gcc 10 issue) and looks like the mailing list doesn't work.
Do I have to do that through gihub?
>
> Putting all that together, I think this patch should add a
> comment that this code snippet is accepting non-RFC3168-compliant
> behavior for the sake of robustness to known deployed stacks, and
> should put that code in a place reached by data receivers. I
> would suggest perhaps something like:
Thanks for review!
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 12fda8f27b08..717475b6f5c3 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -3665,6 +3665,13 @@ static int tcp_ack(struct sock *sk, const
> struct sk_buff *skb, int flag)
> tcp_in_ack_event(sk, ack_ev_flags);
> }
>
> + /* An RFC3168-compliant sender should only set CWR on data
> segments.
> + * ("it SHOULD set the CWR bit only on the first new data packet
> that
> + * it transmits." However, we accept CWR on pure ACKs to be more
> robust
> + * with widely-deployed TCP implementations that do this.
> + */
> + tcp_ecn_accept_cwr(sk, skb);
> +
> /* We passed data and got it acked, remove any soft error
> * log. Something worked...
> */
> @@ -4800,8 +4807,6 @@ static void tcp_data_queue(struct sock *sk,
> struct sk_buff *skb)
> skb_dst_drop(skb);
> __skb_pull(skb, tcp_hdr(skb)->doff * 4);
>
> - tcp_ecn_accept_cwr(sk, skb);
> -
> tp->rx_opt.dsack = 0;
>
> /* Queue data for delivery to the user.
> ---
>
> best,
> neal
>
Powered by blists - more mailing lists