netdev - Re: Linux ECN Handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACJspmLJWE0Ld1ZW_PgKoyeNgbDumL-P3mWAN-8WZRM9fGVvFg@mail.gmail.com>
Date:   Sun, 19 Nov 2017 23:31:04 -0800
From:   Steve Ibanez <sibanez@...nford.edu>
To:     Daniel Borkmann <daniel@...earbox.net>
Cc:     Neal Cardwell <ncardwell@...gle.com>,
        Netdev <netdev@...r.kernel.org>, Florian Westphal <fw@...len.de>,
        Mohammad Alizadeh <alizadeh@...il.mit.edu>,
        Lawrence Brakmo <brakmo@...com>
Subject: Re: Linux ECN Handling

Hi Folks,

I wanted to check back in on this for another update and to solicit
some more suggestions. I did a bit more digging to try an isolate the
problem.

As I explained earlier, the log generated by tcp_probe indicates that
the snd_cwnd is set to 1 just before the end host receives an ECN
marked ACK and unexpectedly enters a timeout (
https://drive.google.com/open?id=1iyt8PvBxQga2jpRpBJ8KdQw3Q_mPTzZF ).
I was trying to track down where this is happening, but the only place
I could find that might be setting the snd_cwnd to 1 is in the
tcp_enter_loss() function. I inserted a printk() call in this function
to see when it is being invoked and it looks like it is only called by
the tcp_retransmit_timer() function after the timer expires.

I decided to try recording the snd_cwnd, ss-thresh, and icsk_ca_state
inside the tcp_fastretrans_alert() function whenever it processes an
ECN marked ACK (
https://drive.google.com/open?id=17GD77lb9lkCSu0_s9p40GZ5r4EU8B4VB )
This plot also shows when the tcp_retransmit_timer() and
tcp_enter_loss() functions are invoked (red and purple dots
respectively). And I see that the ACK state machine is always either
in the TCP_CA_Open or TCP_CA_CWR state whenever the
tcp_fastretrans_alert() function processes ECN marked ACKs (
https://drive.google.com/open?id=1xwuPxjgwriT9DSblFx2uILfQ95Fy-Eqq ).
So I'm not sure where the snd_cwnd is being set to 1 (or possibly 0 as
Neal suggested) just before entering a timeout. Any suggestions here?

In order to do a bit of profiling of the tcp_dctcp code I added
support into tcp_probe for recording the dctcp alpha parameter. I see
that alpha oscillates around about 0.1 when the flow rates have
converged, it goes to zero when the other host enters a timeout, and I
don't see any unexpected behavior just before the timeout (
https://drive.google.com/open?id=1zPdyS57TrUYZIekbid9p1UNyraLYrdw7 ).

So I haven't had much luck yet trying to track down where the problem
is. If you have any suggestions that would help me to focus my search
efforts, I would appreciate the comments.

Thanks!
-Steve


On Mon, Nov 6, 2017 at 3:31 PM, Steve Ibanez <sibanez@...nford.edu> wrote:
> Hi Daniel,
>
> Apologies for the delay. I tried out Neal's suggestion to printk the
> cwnd and ss-thresh in the tcp_init_cwnd_reduction(),
> tcp_cwnd_reduction(), and tcp_end_cwnd_reduction() functions in
> tcp_input.c. From what I can tell, none of these functions are setting
> the cwnd to 0.
>
> Here is the kernel log with the cwnd and ss-thresh print statements:
> https://drive.google.com/open?id=1LEWIkz64NuZN3yuDpBOAXbUfJfiju55O
> And here is the corresponding packet trace at this end host:
> https://drive.google.com/open?id=1qf4cSW3wzsiwPngcYpZY-AoBspuqONLH
> (The kernel log buffer was not large enough to capture the full
> 3-second experiment, so there's only about a second of data for the
> log buffer and about 3 seconds for the packet trace.)
>
> Here are is a plot of the cwnd and ss-thresh from each of the three functions:
>     - tcp_init_cwnd_reduction:
> https://drive.google.com/open?id=1KOEXG2ISJQMi9c6KyPOQ6rpVUVsQwtWU
>     - tcp_cwnd_reduction:
> https://drive.google.com/open?id=1awoPWC3hi4CGZt7HyuI4aAaLG1LPLwJE
>     - tcp_end_cwnd_reduction:
> https://drive.google.com/open?id=1G7XUSnkX8tP7Z5XdY2O97OWj6jguQHO5
>
> Here is a plot of the measured flow rates:
> https://drive.google.com/open?id=1XwmGve10J4qa1nPE3LustK8NbvhZscac
>
> The kernel log and packet trace data was collected on the 10.0.0.3
> host. The cwnd and ss-thresh plots are from the final second or so of
> the experiment and they show two timeout events. In the first event,
> the 10.0.0.1 host times out allowing 10.0.0.3 to increase it's cwnd.
> And in the second event, the 10.0.0.3 host times out causing the cwnd
> to decrease from ~100 to about ~10. The cwnd samples from tcp_probe (
> https://drive.google.com/open?id=1QCuPspLqbGoA68MKTaAh7rx2wCv3Cr_e )
> indicate that the cwnd is 1 MSS just before the timeout event, but I
> don't see that in the data collected from the tcp_*_cwnd_reduction
> functions.
>
> Here is a diff of the changes that I applied to the tcp_input.c file:
> https://drive.google.com/open?id=1k5x3AkfTr3tJhohSIcmQp-3g2yTVNMWm
>
> Are there other places in the code that you would suggest I check for
> how the cwnd and ss-thresh are changing?
>
> Thanks,
> -Steve
>
>
> On Mon, Nov 6, 2017 at 6:08 AM, Daniel Borkmann <daniel@...earbox.net> wrote:
>> On 10/24/2017 03:11 AM, Neal Cardwell wrote:
>>>
>>> On Mon, Oct 23, 2017 at 6:15 PM, Steve Ibanez <sibanez@...nford.edu>
>>> wrote:
>>>>
>>>> Hi All,
>>>>
>>>> I upgraded the kernel on all of our machines to Linux
>>>> 4.13.8-041308-lowlatency. However, I'm still observing the same
>>>> behavior where the source enters a timeout when the CWND=1MSS and it
>>>> receives ECN marks.
>>>>
>>>> Here are the measured flow rates:
>>>>
>>>> <https://drive.google.com/file/d/0B-bt9QS-C3ONT0VXMUt6WHhKREE/view?usp=sharing>
>>>>
>>>> Here are snapshots of the packet traces at the sources when they both
>>>> enter a timeout at t=1.6sec:
>>>>
>>>> 10.0.0.1 timeout event:
>>>>
>>>> <https://drive.google.com/file/d/0B-bt9QS-C3ONcl9WRnRPazg2ems/view?usp=sharing>
>>>>
>>>> 10.0.0.3 timeout event:
>>>>
>>>> <https://drive.google.com/file/d/0B-bt9QS-C3ONeDlxRjNXa0VzWm8/view?usp=sharing>
>>>>
>>>> Both still essentially follow the same sequence of events that I
>>>> mentioned earlier:
>>>> (1) receives an ACK for byte XYZ with the ECN flag set
>>>> (2) stops sending for RTO_min=300ms
>>>> (3) sends a retransmission for byte XYZ
>>>>
>>>> The cwnd samples reported by tcp_probe still indicate that the sources
>>>> are reacting to the ECN marks more than once per window. Here are the
>>>> cwnd samples at the same timeout event mentioned above:
>>>>
>>>> <https://drive.google.com/file/d/0B-bt9QS-C3ONdEZQdktpaW5JUm8/view?usp=sharing>
>>>>
>>>> Let me know if there is anything else you think I should try.
>>>
>>>
>>> Sounds like perhaps cwnd is being set to 0 somewhere in this DCTCP
>>> scenario. Would you be able to add printk statements in
>>> tcp_init_cwnd_reduction(), tcp_cwnd_reduction(), and
>>> tcp_end_cwnd_reduction(), printing the IP:port, tp->snd_cwnd, and
>>> tp->snd_ssthresh?
>>>
>>> Based on the output you may be able to figure out where cwnd is being
>>> set to zero. If not, could you please post the printk output and
>>> tcpdump traces (.pcap, headers-only is fine) from your tests?
>>
>>
>> Hi Steve, do you have any updates on your debugging?
>>
>>> thanks,
>>> neal
>>>
>>