[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADVnQykQszMkHq+KvebHcWinc319H=NMjnP+bX5miFpN8tPPzw@mail.gmail.com>
Date: Mon, 31 Jul 2017 23:17:01 -0400
From: Neal Cardwell <ncardwell@...gle.com>
To: Willy Tarreau <w@....eu>
Cc: Eric Dumazet <eric.dumazet@...il.com>, Klavs Klavsen <kl@...n.dk>,
Netdev <netdev@...r.kernel.org>,
Yuchung Cheng <ycheng@...gle.com>,
Nandita Dukkipati <nanditad@...gle.com>
Subject: Re: TCP fast retransmit issues
On Fri, Jul 28, 2017 at 6:54 PM, Neal Cardwell <ncardwell@...gle.com> wrote:
> On Wed, Jul 26, 2017 at 3:02 PM, Neal Cardwell <ncardwell@...gle.com> wrote:
>> On Wed, Jul 26, 2017 at 2:38 PM, Neal Cardwell <ncardwell@...gle.com> wrote:
>>> Yeah, it looks like I can reproduce this issue with (1) bad sacks
>>> causing repeated TLPs, and (2) TLPs timers being pushed out to later
>>> times due to incoming data. Scripts are attached.
>>
>> I'm testing a fix of only scheduling a TLP if (flag & FLAG_DATA_ACKED)
>> is true...
>
> An update for the TLP aspect of this thread: our team has a proposed
> fix for this RTO/TLP reschedule issue that we have reviewed internally
> and tested with our packetdrill test suite, including some new tests.
> The basic approach in the fix is as follows:
>
> a) only reschedule the xmit timer once per ACK
>
> b) only reschedule the xmit timer if tcp_clean_rtx_queue() deems this
> is safe (a packet was cumulatively ACKed, or we got a SACK for a
> packet that was sent before the most recent retransmit of the write
> queue head).
>
> After further review and testing we will post it. Hopefully next week.
The timer patches are upstream for review for the "net" branch:
https://patchwork.ozlabs.org/patch/796057/
https://patchwork.ozlabs.org/patch/796058/
https://patchwork.ozlabs.org/patch/796059/
Again, thank you for reporting this, and thanks for the packet trace!
neal
Powered by blists - more mailing lists