netdev - Re: TCP connection closed without FIN or RST

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1509568471.3828.50.camel@edumazet-glaptop3.roam.corp.google.com>
Date:   Wed, 01 Nov 2017 13:34:31 -0700
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     Vitaly Davidovich <vitalyd@...il.com>
Cc:     netdev@...r.kernel.org
Subject: Re: TCP connection closed without FIN or RST

On Wed, 2017-11-01 at 16:25 -0400, Vitaly Davidovich wrote:
> Hi all,
> 
> I'm seeing some puzzling TCP behavior that I'm hoping someone on this
> list can shed some light on.  Apologies if this isn't the right forum
> for this type of question.  But here goes anyway :)
> 
> I have client and server x86-64 linux machines with the 4.1.35 kernel.
> I set up the following test/scenario:
> 
> 1) Client connects to the server and requests a stream of data.  The
> server (written in Java) starts to send data.
> 2) Client then goes to sleep for 15 minutes (I'll explain why below).
> 3) Naturally, the server's sendq fills up and it blocks on a write() syscall.
> 4) Similarly, the client's recvq fills up.
> 5) After 15 minutes the client wakes up and reads the data off the
> socket fairly quickly - the recvq is fully drained.
> 6) At about the same time, the server's write() fails with ETIMEDOUT.
> The server then proceeds to close() the socket.
> 7) The client, however, remains forever stuck in its read() call.
> 
> When the client is stuck in read(), netstat on the server does not
> show the tcp connection - it's gone.  On the client, netstat shows the
> connection with 0 recv (and send) queue size and in ESTABLISHED state.
> 
> I have done a packet capture (using tcpdump) on the server, and
> expected to see either a FIN or RST packet to be sent to the client -
> neither of these are present.  What is present, however, is a bunch of
> retrans from the server to the client, with what appears to be
> exponential backoff.  However, the conversation just stops around the
> time when the ETIMEDOUT error occurred.  I do not see any attempt to
> abort or gracefully shut down the TCP stream.
> 
> When I strace the server thread that was blocked on write(), I do see
> the ETIMEDOUT error from write(), followed by a close() on the socket
> fd.
> 
> Would anyone possibly know what could cause this? Or suggestions on
> how to troubleshoot further? In particular, are there any known cases
> where a FIN or RST wouldn't be sent after a write() times out due to
> too many retrans? I believe this might be related to the tcp_retries2
> behavior (the system is configured with the default value of 15),
> where too many retrans attempts will cause write() to error with a
> timeout.  My understanding is that this shouldn't do anything to the
> state of the socket on its own - it should stay in the ESTABLISHED
> state.  But then presumably a close() should start the shutdown state
> machine by sending a FIN packet to the client and entering FIN WAIT1
> on the server.
> 
> Ok, as to why I'm doing a test where the client sleeps for 15 minutes
> - this is an attempt at reproducing a problem that I saw with a client
> that wasn't sleeping intentionally, but otherwise the situation
> appeared to be the same - the server write() blocked, eventually timed
> out, server tcp session was gone, but client was stuck in a read()
> syscall with the tcp session still in ESTABLISHED state.
> 
> Thanks a lot ahead of time for any insights/help!

We might have an issue with win 0 probes (Probe0), hitting a max number
of retransmits/probes.

I can check this.