netdev - Re: TCP stall issue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <C5332AE4-DFAF-4127-91D1-A9108877507A@gmail.com>
Date:   Wed, 24 Feb 2021 11:03:10 +0100
From:   Gil Pedersen <kanongil@...il.com>
To:     Neal Cardwell <ncardwell@...gle.com>
Cc:     David Miller <davem@...emloft.net>,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        dsahern@...nel.org, Netdev <netdev@...r.kernel.org>,
        Yuchung Cheng <ycheng@...gle.com>,
        Eric Dumazet <edumazet@...gle.com>
Subject: Re: TCP stall issue



> On 23 Feb 2021, at 16.41, Neal Cardwell <ncardwell@...gle.com> wrote:
> 
> On Tue, Feb 23, 2021 at 5:13 AM Gil Pedersen <kanongil@...il.com> wrote:
>> 
>> Hi,
>> 
>> I am investigating a TCP stall that can occur when sending to an Android device (kernel 4.9.148) from an Ubuntu server running kernel 5.11.0.
>> 
>> The issue seems to be that RACK is not applied when a D-SACK (with SACK) is received on the server after an RTO re-transmission (CA_Loss state). Here the re-transmitted segment is considered to be already delivered and loss undo logic is applied. Then nothing is re-transmitted until the next RTO, where the next segment is sent and the same thing happens again. The causes the retransmitted segments to be delivered at a rate of ~1 per second, so a burst loss of eg. 20 segments cause a 20+ second stall. I would expect RACK to kick in long before this happens.
>> 
>> Note the D-SACK should not be considered spurious, as the TSecr value matches the re-transmission TSval.
>> 
>> Also, the Android receiver is definitely sending strange D-SACKs that does not properly advance the ACK number to include received segments. However, I can't control it and need to fix it on the server by quickly re-transmitting the segments. The connection itself is functional. If the client makes a request to the server in this state, it can respond and the client will receive any segments sent in reply.
>> 
>> I can see from counters that TcpExtTCPLossUndo & TcpExtTCPSackFailures are incremented on the server when this happens.
>> The issue appears both with F-RTO enabled and disabled. Also appears both with BBR and RENO.
>> 
>> Any idea of why this happens, or suggestions on how to debug the issue further?
>> 
>> /Gil
> 
> Thanks for the detailed report! It sounds like you have a trace. Can
> you please attach (or post the URL of) a binary tcpdump .pcap trace
> that illustrates the problem, to make sure we can understand and
> reproduce the issue?
> 
> thanks,
> neal

Sure, I attached a trace from the server that should illustrate the issue.

The trace is cut from a longer flow with the server at 188.120.85.11 and a client window scaling factor of 256.

Packet 78 is a TLP, followed by a delayed DUPACK with a SACK from the client.
The SACK triggers a single segment fast re-transmit with an ignored?? D-SACK in packet 81.
The first RTO happens at packet 82.


Download attachment "rack-rto-stall.pcap" of type "application/octet-stream" (439417 bytes)