netdev - Re: Linux ECN Handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CACJspmLaxdHoa63jCuD-mKJS35BZ69b9qw3tEZjFxbUNb3PSHg@mail.gmail.com>
Date:   Tue, 5 Dec 2017 11:36:44 -0800
From:   Steve Ibanez <sibanez@...nford.edu>
To:     Neal Cardwell <ncardwell@...gle.com>
Cc:     Eric Dumazet <edumazet@...gle.com>,
        Yuchung Cheng <ycheng@...gle.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Netdev <netdev@...r.kernel.org>, Florian Westphal <fw@...len.de>,
        Mohammad Alizadeh <alizadeh@...il.mit.edu>,
        Lawrence Brakmo <brakmo@...com>
Subject: Re: Linux ECN Handling

Hi Neal,

I've included a link to small trace of 13 packets which is different
from the screenshot I attached in my last email, but shows the same
sequence of events. It's a bit hard to read the tcptrace due to the
300ms timeout, so I figured this was the best approach.

slice.pcap: https://drive.google.com/open?id=1hYXbUClHGbQv1hWG1HZWDO2WYf30N6G8

Thanks for the help!
-Steve

On Tue, Dec 5, 2017 at 7:23 AM, Neal Cardwell <ncardwell@...gle.com> wrote:
> On Tue, Dec 5, 2017 at 12:22 AM, Steve Ibanez <sibanez@...nford.edu> wrote:
>> Hi Neal,
>>
>> Happy to help out :) And thanks for the tip!
>>
>> I was able to track down where the missing bytes that you pointed out
>> are being lost. It turns out the destination host seems to be
>> misbehaving. I performed a packet capture at the destination host
>> interface (a snapshot of the trace is attached). I see the following
>> sequence of events when a timeout occurs (note that I have NIC
>> offloading enabled so wireshark captures packets larger than the MTU):
>>
>> 1. The destination receives a data packet of length X with seqNo = Y
>> from the src with the CWR bit set and does not send back a
>> corresponding ACK.
>> 2. The source times out and sends a retransmission packet of length Z
>> (where Z < X) with seqNo = Y
>> 3. The destination sends back an ACK with AckNo = Y + X
>>
>> So in other words, the packet which the destination host does not
>> initially ACK (causing the timeout) does not actually get lost because
>> after receiving the retransmission the AckNo moves forward all the way
>> past the bytes in the initial unACKed CWR packet. In the attached
>> screenshot, I've marked the unACKed CWR packet with a red box.
>>
>> Have you seen this behavior before? And do you know what might be
>> causing the destination host not to ACK the CWR packet? In most cases
>> the CWR marked packets are ACKed properly, it's just occasionally they
>> are not.
>
> Thanks for the detailed report!
>
> I have not heard of an incoming CWR causing the receiver to fail to
> ACK. And in re-reading the code, I don't see an obvious way in which a
> CWR bit should cause the receiver to fail to ACK.
>
> That screen shot is a bit hard to parse. Would you be able to post a
> tcpdump .pcap of that particular section, or post a screen shot of a
> time-sequence plot of that section?
>
> To extract that segment and take screen shot, you could use something like:
>
>   editcap -A "2017-12-04 11:22:27"  -B "2017-12-04 11:22:30"  all.pcap
> slice.pcap
>   tcptrace -S -xy -zy slice.pcap
>   xplot.org a2b_tsg.xpl &
>   # take screenshot
>
> Or, alternatively, would you be able to post the slice.pcap on a web
> server or public drive?
>
> thanks,
> neal