[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADVnQymKZAc+SFD7X-FoP9UwHgB2MBCY48=dL8mH6XuKumkx4g@mail.gmail.com>
Date: Wed, 26 Jul 2017 12:43:36 -0400
From: Neal Cardwell <ncardwell@...gle.com>
To: Willy Tarreau <w@....eu>
Cc: Eric Dumazet <eric.dumazet@...il.com>, Klavs Klavsen <kl@...n.dk>,
Netdev <netdev@...r.kernel.org>,
Yuchung Cheng <ycheng@...gle.com>,
Nandita Dukkipati <nanditad@...gle.com>
Subject: Re: TCP fast retransmit issues
On Wed, Jul 26, 2017 at 04:08:19PM +0200, Klavs Klavsen wrote:
> Grabbed on both ends.
>
> http://blog.klavsen.info/fast-retransmit-problem-junos-linux (updated to new
> dump - from client scp'ing)
> http://blog.klavsen.info/fast-retransmit-problem-junos-linux-receiving-side
> (receiving host)
Looking at some time-sequence plots of the sender trace (attached),
and thinking about the Linux TCP sender code, it like there are at
least two interesting things going on:
(1) Because the connection negotiated SACK, the Linux TCP sender does
not get to its tcp_add_reno_sack() code to count dupacks and enter
fast recovery on the 3rd dupack. The sender keeps waiting for specific
packets to be SACKed that would signal that something has probably
been lost. We could probably mitigate this by having the sender turn
off SACK once it sees SACKed ranges that are completely invalid (way
out of window). Then it should use the old non-SACK "Recovery on 3rd
dupack" path.
(2) It looks like there is a bug in the sender code where it seems to
be repeatedly using a TLP timer firing 211ms after every ACK is
received to transmit another TLP probe (a new packet in this case).
Somehow these weird invalid SACKs seem to be triggering a code path
that makes us think we can send another TLP, when we probably should
be firing an RTO. That's my interpretation, anyway. I will try to
reproduce this with packetdrill.
neal
Download attachment "linux-tcp-fr-issues-2017-07-26-zoomed-out.png" of type "image/png" (37196 bytes)
Download attachment "linux-tcp-fr-issues-2017-07-26-zoomed-in-1.png" of type "image/png" (39867 bytes)
Download attachment "linux-tcp-fr-issues-2017-07-26-zoomed-in-2.png" of type "image/png" (37358 bytes)
Powered by blists - more mailing lists