[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87eelqs9za.fsf@marvin.dmesg.gr>
Date: Thu, 22 Oct 2020 15:47:53 +0300
From: Apollon Oikonomopoulos <apoikos@...sg.gr>
To: Neal Cardwell <ncardwell@...gle.com>
Cc: Yuchung Cheng <ycheng@...gle.com>, Netdev <netdev@...r.kernel.org>,
Eric Dumazet <edumazet@...gle.com>,
Soheil Hassas Yeganeh <soheil@...gle.com>
Subject: Re: TCP sender stuck in persist despite peer advertising non-zero window
Apollon Oikonomopoulos <apoikos@...sg.gr> writes:
> We are now running the patched kernel on the machines involved. I want
> to give it some time just to be sure, so I'll get back to you by
> Thursday if everything goes well.
It has been almost a week and we have had zero hangs in 60 rsync runs,
so I guess we can call it fixed. At the same time we didn't notice any
ill side-effects. In the unlikely event it hangs again, I will let you
know.
I spent quite some time pondering this issue and to be honest it
troubles me that it seems to have been there for far too long for nobody
else to have noticed. The only reasonable explanation I can come up with
is that (please comment/correct me if I'm wrong):
1. It will not be triggered by most L7 protocols. In "synchronous"
request-response protocols such as HTTP, usually each side will
consume all available data before sending. In this case, even if
snd_wl1 wraps around, the bulk receiver is left with a non-zero
window and is still able to send out data, causing the next
acknowledgment to update the window and adjust snd_wl1. Also I
cannot think of any asynchronous protocol apart from rsync where the
server sends out multi-GB responses without checking for incoming
data in the process.
2. Regardless of the application protocol, the receiver must remain
long enough (for at least 2GB) with a zero send window in the fast
path to cause a wraparound — but not too long for after(ack_seq,
snd_wl1) to be true again. In practice this means that header
prediction should not fail (not even once!) and we should never run
out of receive space, as these conditions would send us to the slow
path and call tcp_ack(). I'd argue this is likely to happen only
with stable, long-running, low- or moderately-paced TCP connections
in local networks where packet loss is minimal (although most of the
time things move around as fast as they can in a local network). At
this point I wonder if the userspace rate-limiting we enabled on
rsync actually did more harm…
Finally, even if someone hits this, any application caring about network
timeouts will either fail or reconnect, making it appear as a "random
network glitch" and leaving no traces to debug behind. And in the
unlikely event that your application lingers forever in the persist
state, it certainly takes a fair amount of annoyance to sidestep your
ignorance, decide that this might indeed be a kernel bug, and go after
it :)
Thanks again for the fast response!
Best,
Apollon
P.S: I wonder if it would make sense to expose snd_una and snd_wl1
in struct tcp_info to ease debugging.
Powered by blists - more mailing lists