[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHjP37HqsNxCmAcB-XoXqOOY8dRJTK7XMvheKahNmvC=KQUHNA@mail.gmail.com>
Date: Fri, 3 Nov 2017 09:38:19 -0400
From: Vitaly Davidovich <vitalyd@...il.com>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: netdev <netdev@...r.kernel.org>
Subject: Re: TCP connection closed without FIN or RST
On Fri, Nov 3, 2017 at 9:00 AM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> On Fri, 2017-11-03 at 08:41 -0400, Vitaly Davidovich wrote:
>> Hi Eric,
>>
>> Ran a few more tests yesterday with packet captures, including a
>> capture on the client. It turns out that the client stops ack'ing
>> entirely at some point in the conversation - the last advertised
>> client window is not even close to zero (it's actually ~348K). So
>> there's complete radio silence from the client for some reason, even
>> though it does send back ACKs early on in the conversation. So yes,
>> as far as the server is concerned, the client is completely gone and
>> tcp_retries2 rightfully breaches eventually once the server retrans go
>> unanswered long (and for sufficient times) enough.
>>
>> What's odd though is the packet capture on the client shows the server
>> retrans packets arriving, so it's not like the segments don't reach
>> the client. I'll keep investigating, but if you (or anyone else
>> reading this) knows of circumstances that might cause this, I'd
>> appreciate any tips on where/what to look at.
>
>
> Might be a middle box issue ? Like a firewall connection tracking
> having some kind of timeout if nothing is sent on one direction ?
Yeah, that's certainly possible although I've not found evidence of
that yet, including asking sysadmins. But it's definitely an avenue
I'm going to walk a bit further down.
>
> What output do you have from client side with :
>
> ss -temoi dst <server_ip>
I snipped some irrelevant info, like IP addresses, uid, inode number, etc.
Client before it wakes up - the recvq has been at 125976 for the
entire time it's been sleeping (15 minutes):
State Recv-Q Send-Q
ESTAB 125976 0
skmem:(r151040,rb150000,t0,tb150000,f512,w0,o0,bl0) ts
sack scalable wscale:0,11 rto:208 rtt:4.664/8.781 ato:40 mss:1448
cwnd:10 send 24.8Mbps rcv_rtt:321786 rcv_space:524140
While the server is on its last retrans timer, the client wakes up and
slurps up its recv buffer:
State Recv-Q Send-Q
ESTAB 0 0
skmem:(r0,rb150000,t0,tb150000,f151552,w0,o0,bl0) ts
sack scalable wscale:0,11 rto:208 rtt:4.664/8.781 ato:40 mss:1448
cwnd:10 send 24.8Mbps rcv_rtt:321786 rcv_space:524140
Here's the cmd output from the server right before the last retrans
timer expires and the socket is aborted. Note that this output is
after the client has drained its recv queue (the output right above):
State Recv-Q Send-Q
ESTAB 0 925272
timer:(on,14sec,15)
skmem:(r0,rb100000,t0,tb1050000,f2440,w947832,o0,bl0) ts sack
scalable wscale:11,0 rto:120000 rtt:9.69/16.482 ato:40 mss:1448 cwnd:1
ssthresh:89 send 1.2Mbps unacked:99 retrans:1/15 lost:99 rcv_rtt:4
rcv_space:28960
Also worth noting the server's sendq has been at 925272 the entire time as well.
Does anything stand out here? I guess one thing that stands out to me
(but that could be due to my lack of in-depth knowledge of this) is
that the client rcv_space is significantly larger than the recvq.
Thanks Eric!
>
>
Powered by blists - more mailing lists