netdev - Re: Linux ECN Handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CADVnQynYn0hLJdmBE+=sy8jtVNi6xZexq0T+Lzukub60ft0VAA@mail.gmail.com>
Date:   Mon, 20 Nov 2017 10:05:47 -0500
From:   Neal Cardwell <ncardwell@...gle.com>
To:     Steve Ibanez <sibanez@...nford.edu>
Cc:     Daniel Borkmann <daniel@...earbox.net>,
        Netdev <netdev@...r.kernel.org>, Florian Westphal <fw@...len.de>,
        Mohammad Alizadeh <alizadeh@...il.mit.edu>,
        Lawrence Brakmo <brakmo@...com>
Subject: Re: Linux ECN Handling

On Mon, Nov 20, 2017 at 2:31 AM, Steve Ibanez <sibanez@...nford.edu> wrote:
> Hi Folks,
>
> I wanted to check back in on this for another update and to solicit
> some more suggestions. I did a bit more digging to try an isolate the
> problem.

Going back to one of your Oct 19 trace snapshots (attached), AFAICT at
the time of the timeout there is actually almost 64KBytes  (352553398
+ 1448 - 352489686 = 65160) of unacknowledged data. So there really
does seem to be a significant chunk of packets that were in-flight
that were then declared lost.

So here is a possibility: perhaps the combination of CWR+PRR plus
tcp_tso_should_defer() means that PRR can make cwnd so gentle that
tcp_tso_should_defer() thinks we should wait for another ACK to send,
and that ACK doesn't come. Breaking it, down, the potential sequence
would be:

(1) tcp_write_xmit() does not send, because the CWR behavior, using
PRR, does not leave enough cwnd for tcp_tso_should_defer() to think we
should send (PRR was originally designed for recovery, which did not
have TSO deferral)

(2) TLP does not fire, because we are in state CWR, not Open

(3) The only remaining option is an RTO, which fires.

In other words, the possibility is that, at the time of the stall, the
cwnd is reasonably high, but tcp_packets_in_flight() is also quite
high, so either there is (a) literally no unused cwnd left (
tcp_packets_in_flight() == cwnd), or (b) some mechanism like
tcp_tso_should_defer() is deciding that there is not enough available
cwnd for it to make sense to chop off a fraction of a TSO skb to send
now.

One way to test that conjecture would be to disable
tcp_tso_should_defer() by adding a:

   goto send_now;

at the top of tcp_tso_should_defer().

If that doesn't prevent the freezes then I would recommend adding
printks or other instrumentation to  tcp_write_xmit() to log:

- time
- ca_state
- cwnd
- ssthresh
- tcp_packets_in_flight()
- the reason for breaking out of the tcp_write_xmit() loop (tso
deferral, no packets left, tcp_snd_wnd_test, tcp_nagle_test, etc)

cheers,
neal

Download attachment "han-3_timeout-event.png" of type "image/png" (91597 bytes)