[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK6E8=ed3Lt4seZNv+veaVT-U893FVHqWGWhzwfEXyeb-yAkhA@mail.gmail.com>
Date: Mon, 26 Mar 2018 19:07:12 -0700
From: Yuchung Cheng <ycheng@...gle.com>
To: Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
Cc: netdev <netdev@...r.kernel.org>,
Neal Cardwell <ncardwell@...gle.com>,
Eric Dumazet <edumazet@...gle.com>,
Sergei Shtylyov <sergei.shtylyov@...entembedded.com>
Subject: Re: [PATCH v3 net 1/5] tcp: feed correct number of pkts acked to cc
modules also in recovery
On Tue, Mar 13, 2018 at 3:25 AM, Ilpo Järvinen
<ilpo.jarvinen@...sinki.fi> wrote:
>
> A miscalculation for the number of acknowledged packets occurs during
> RTO recovery whenever SACK is not enabled and a cumulative ACK covers
> any non-retransmitted skbs. The reason is that pkts_acked value
> calculated in tcp_clean_rtx_queue is not correct for slow start after
> RTO as it may include segments that were not lost and therefore did
> not need retransmissions in the slow start following the RTO. Then
> tcp_slow_start will add the excess into cwnd bloating it and
> triggering a burst.
>
> Instead, we want to pass only the number of retransmitted segments
> that were covered by the cumulative ACK (and potentially newly sent
> data segments too if the cumulative ACK covers that far).
>
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
> ---
> net/ipv4/tcp_input.c | 16 +++++++++++++++-
> 1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 9a1b3c1..4a26c09 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -3027,6 +3027,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, u32 prior_fack,
> long seq_rtt_us = -1L;
> long ca_rtt_us = -1L;
> u32 pkts_acked = 0;
> + u32 rexmit_acked = 0;
> + u32 newdata_acked = 0;
> u32 last_in_flight = 0;
> bool rtt_update;
> int flag = 0;
> @@ -3056,8 +3058,10 @@ static int tcp_clean_rtx_queue(struct sock *sk, u32 prior_fack,
> }
>
> if (unlikely(sacked & TCPCB_RETRANS)) {
> - if (sacked & TCPCB_SACKED_RETRANS)
> + if (sacked & TCPCB_SACKED_RETRANS) {
> tp->retrans_out -= acked_pcount;
> + rexmit_acked += acked_pcount;
> + }
> flag |= FLAG_RETRANS_DATA_ACKED;
> } else if (!(sacked & TCPCB_SACKED_ACKED)) {
> last_ackt = skb->skb_mstamp;
> @@ -3070,6 +3074,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, u32 prior_fack,
> reord = start_seq;
> if (!after(scb->end_seq, tp->high_seq))
> flag |= FLAG_ORIG_SACK_ACKED;
> + else
> + newdata_acked += acked_pcount;
> }
>
> if (sacked & TCPCB_SACKED_ACKED) {
> @@ -3151,6 +3157,14 @@ static int tcp_clean_rtx_queue(struct sock *sk, u32 prior_fack,
> }
>
> if (tcp_is_reno(tp)) {
> + /* Due to discontinuity on RTO in the artificial
> + * sacked_out calculations, TCP must restrict
> + * pkts_acked without SACK to rexmits and new data
> + * segments
> + */
> + if (icsk->icsk_ca_state == TCP_CA_Loss)
> + pkts_acked = rexmit_acked + newdata_acked;
> +
My understanding is there are two problems
1) your fix: the reordering logic in tcp-remove_reno_sacks requires
precise cumulatively acked count, not newly acked count?
2) current code: pkts_acked can substantially over-estimate the newly
delivered pkts in both SACK and non-SACK cases. For example, let's say
99/100 packets are already sacked, and the next ACK acks 100 pkts.
pkts_acked == 100 but really only one packet is delivered. It's wrong
to inform congestion control that 100 packets have just delivered.
AFAICT, the CCs that have pkts_acked callbacks all treat pkts_acked as
the newly delivered packets.
A better fix for both SACK and non-SACK, seems to be moving
ca_ops->pkts_acked into tcp_cong_control, where the "acked_sacked" is
calibrated? this is what BBR is currently doing to avoid these pitfalls.
> tcp_remove_reno_sacks(sk, pkts_acked);
> } else {
> int delta;
> --
> 2.7.4
>
Powered by blists - more mailing lists