[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADVnQynzhKngDM20YA-bGsLGq8k_5ikU3w0YDpdg8Pk5eMsssw@mail.gmail.com>
Date: Tue, 5 Dec 2017 10:23:29 -0500
From: Neal Cardwell <ncardwell@...gle.com>
To: Steve Ibanez <sibanez@...nford.edu>
Cc: Eric Dumazet <edumazet@...gle.com>,
Yuchung Cheng <ycheng@...gle.com>,
Daniel Borkmann <daniel@...earbox.net>,
Netdev <netdev@...r.kernel.org>, Florian Westphal <fw@...len.de>,
Mohammad Alizadeh <alizadeh@...il.mit.edu>,
Lawrence Brakmo <brakmo@...com>
Subject: Re: Linux ECN Handling
On Tue, Dec 5, 2017 at 12:22 AM, Steve Ibanez <sibanez@...nford.edu> wrote:
> Hi Neal,
>
> Happy to help out :) And thanks for the tip!
>
> I was able to track down where the missing bytes that you pointed out
> are being lost. It turns out the destination host seems to be
> misbehaving. I performed a packet capture at the destination host
> interface (a snapshot of the trace is attached). I see the following
> sequence of events when a timeout occurs (note that I have NIC
> offloading enabled so wireshark captures packets larger than the MTU):
>
> 1. The destination receives a data packet of length X with seqNo = Y
> from the src with the CWR bit set and does not send back a
> corresponding ACK.
> 2. The source times out and sends a retransmission packet of length Z
> (where Z < X) with seqNo = Y
> 3. The destination sends back an ACK with AckNo = Y + X
>
> So in other words, the packet which the destination host does not
> initially ACK (causing the timeout) does not actually get lost because
> after receiving the retransmission the AckNo moves forward all the way
> past the bytes in the initial unACKed CWR packet. In the attached
> screenshot, I've marked the unACKed CWR packet with a red box.
>
> Have you seen this behavior before? And do you know what might be
> causing the destination host not to ACK the CWR packet? In most cases
> the CWR marked packets are ACKed properly, it's just occasionally they
> are not.
Thanks for the detailed report!
I have not heard of an incoming CWR causing the receiver to fail to
ACK. And in re-reading the code, I don't see an obvious way in which a
CWR bit should cause the receiver to fail to ACK.
That screen shot is a bit hard to parse. Would you be able to post a
tcpdump .pcap of that particular section, or post a screen shot of a
time-sequence plot of that section?
To extract that segment and take screen shot, you could use something like:
editcap -A "2017-12-04 11:22:27" -B "2017-12-04 11:22:30" all.pcap
slice.pcap
tcptrace -S -xy -zy slice.pcap
xplot.org a2b_tsg.xpl &
# take screenshot
Or, alternatively, would you be able to post the slice.pcap on a web
server or public drive?
thanks,
neal
Powered by blists - more mailing lists