netdev - Re: Linux ECN Handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:   Mon, 20 Nov 2017 07:40:52 -0800
From:   Eric Dumazet <edumazet@...gle.com>
To:     Neal Cardwell <ncardwell@...gle.com>
Cc:     Steve Ibanez <sibanez@...nford.edu>,
        Daniel Borkmann <daniel@...earbox.net>,
        Netdev <netdev@...r.kernel.org>, Florian Westphal <fw@...len.de>,
        Mohammad Alizadeh <alizadeh@...il.mit.edu>,
        Lawrence Brakmo <brakmo@...com>,
        Yuchung Cheng <ycheng@...gle.com>
Subject: Re: Linux ECN Handling

On Mon, Nov 20, 2017 at 7:01 AM, Neal Cardwell <ncardwell@...gle.com> wrote:
> Going back to one of your Oct 19 trace snapshots (attached), AFAICT at the
> time of the timeout there is actually almost 64KBytes  (352553398 + 1448 -
> 352489686 = 65160) of unacknowledged data. So there really does seem to be a
> significant chunk of packets that were in-flight that were then declared
> lost.
>
> So here is a possibility: perhaps the combination of CWR+PRR plus
> tcp_tso_should_defer() means that PRR can make cwnd so gentle that
> tcp_tso_should_defer() thinks we should wait for another ACK to send, and
> that ACK doesn't come. Breaking it, down, the potential sequence would be:
>
> (1) tcp_write_xmit() does not send, because the CWR behavior, using PRR,
> does not leave enough cwnd for tcp_tso_should_defer() to think we should
> send (PRR was originally designed for recovery, which did not have TSO
> deferral)
>
> (2) TLP does not fire, because we are in state CWR, not Open
>
> (3) The only remaining option is an RTO, which fires.
>
> In other words, the possibility is that, at the time of the stall, the cwnd
> is reasonably high, but tcp_packets_in_flight() is also quite high, so
> either there is (a) literally no unused cwnd left ( tcp_packets_in_flight()
> == cwnd), or (b) some mechanism like tcp_tso_should_defer() is deciding that
> there is not enough available cwnd for it to make sense to chop off a
> fraction of a TSO skb to send now.
>
> One way to test that conjecture would be to disable tcp_tso_should_defer()
> by adding a:
>
>    goto send_now;
>
> at the top of tcp_tso_should_defer().
>
> If that doesn't prevent the freezes then I would recommend adding printks or
> other instrumentation to  tcp_write_xmit() to log:
>
> - time
> - ca_state
> - cwnd
> - ssthresh
> - tcp_packets_in_flight()
> - the reason for breaking out of the tcp_write_xmit() loop (tso deferral, no
> packets left, tcp_snd_wnd_test, tcp_nagle_test, etc)
>
> cheers,
> neal
>
>
>
> On Mon, Nov 20, 2017 at 2:31 AM, Steve Ibanez <sibanez@...nford.edu> wrote:
>>
>> Hi Folks,
>>
>> I wanted to check back in on this for another update and to solicit
>> some more suggestions. I did a bit more digging to try an isolate the
>> problem.
>>
>> As I explained earlier, the log generated by tcp_probe indicates that
>> the snd_cwnd is set to 1 just before the end host receives an ECN
>> marked ACK and unexpectedly enters a timeout (
>> https://drive.google.com/open?id=1iyt8PvBxQga2jpRpBJ8KdQw3Q_mPTzZF ).
>> I was trying to track down where this is happening, but the only place
>> I could find that might be setting the snd_cwnd to 1 is in the
>> tcp_enter_loss() function. I inserted a printk() call in this function
>> to see when it is being invoked and it looks like it is only called by
>> the tcp_retransmit_timer() function after the timer expires.
>>
>> I decided to try recording the snd_cwnd, ss-thresh, and icsk_ca_state
>> inside the tcp_fastretrans_alert() function whenever it processes an
>> ECN marked ACK (
>> https://drive.google.com/open?id=17GD77lb9lkCSu0_s9p40GZ5r4EU8B4VB )
>> This plot also shows when the tcp_retransmit_timer() and
>> tcp_enter_loss() functions are invoked (red and purple dots
>> respectively). And I see that the ACK state machine is always either
>> in the TCP_CA_Open or TCP_CA_CWR state whenever the
>> tcp_fastretrans_alert() function processes ECN marked ACKs (
>> https://drive.google.com/open?id=1xwuPxjgwriT9DSblFx2uILfQ95Fy-Eqq ).
>> So I'm not sure where the snd_cwnd is being set to 1 (or possibly 0 as
>> Neal suggested) just before entering a timeout. Any suggestions here?
>>
>> In order to do a bit of profiling of the tcp_dctcp code I added
>> support into tcp_probe for recording the dctcp alpha parameter. I see
>> that alpha oscillates around about 0.1 when the flow rates have
>> converged, it goes to zero when the other host enters a timeout, and I
>> don't see any unexpected behavior just before the timeout (
>> https://drive.google.com/open?id=1zPdyS57TrUYZIekbid9p1UNyraLYrdw7 ).
>>
>> So I haven't had much luck yet trying to track down where the problem
>> is. If you have any suggestions that would help me to focus my search
>> efforts, I would appreciate the comments.
>>
>> Thanks!
>> -Steve

Steve, what HZ value your kernel is compiled with ?