netdev - Re: Linux ECN Handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Tue, 21 Nov 2017 10:01:40 -0500
From:   Neal Cardwell <ncardwell@...gle.com>
To:     Steve Ibanez <sibanez@...nford.edu>
Cc:     Daniel Borkmann <daniel@...earbox.net>,
        Netdev <netdev@...r.kernel.org>, Florian Westphal <fw@...len.de>,
        Mohammad Alizadeh <alizadeh@...il.mit.edu>,
        Lawrence Brakmo <brakmo@...com>,
        Yuchung Cheng <ycheng@...gle.com>,
        Eric Dumazet <edumazet@...gle.com>
Subject: Re: Linux ECN Handling

On Tue, Nov 21, 2017 at 12:58 AM, Steve Ibanez <sibanez@...nford.edu> wrote:
> Hi Neal,
>
> I tried your suggestion to disable tcp_tso_should_defer() and it does
> indeed look like it is preventing the host from entering timeouts.
> I'll have to do a bit more digging to try and find where the packets
> are being dropped. I've verified that the bottleneck link queue is
> capacity is at about the configured marking threshold when the timeout
> occurs, so the drops may be happening at the NIC interfaces or perhaps
> somewhere unexpected in the switch.

Great! Thanks for running that test.

> I wonder if you can explain why the TLP doesn't fire when in the CWR
> state? It seems like that might be worth having for cases like this.

The original motivation for only allowing TLP in the CA_Open state was
to be conservative and avoid having the TLP impose extra load on the
bottleneck when it may be congested. Plus if there are any SACKed
packets in the SACK scoreboard then there are other existing
mechanisms to do speedy loss recovery.

But at various times we have talked about expanding the set of
scenarios where TLP is used. And I think this example demonstrates
that there is a class of real-world cases where it probably makes
sense to allow TLP in the CWR state.

If you have time, would you be able to check if leaving
tcp_tso_should_defer () as-is but enabling TLP probes in CWR state
also fixes your performance issue? Perhaps something like
(uncompiled/untested):

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 4ea79b2ad82e..deccf8070f84 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2536,11 +2536,11 @@ bool tcp_schedule_loss_probe(struct sock *sk,
bool advancing_rto)

        early_retrans = sock_net(sk)->ipv4.sysctl_tcp_early_retrans;
        /* Schedule a loss probe in 2*RTT for SACK capable connections
-        * in Open state, that are either limited by cwnd or application.
+        * not in loss recovery, that are either limited by cwnd or application.
         */
        if ((early_retrans != 3 && early_retrans != 4) ||
            !tp->packets_out || !tcp_is_sack(tp) ||
-           icsk->icsk_ca_state != TCP_CA_Open)
+           icsk->icsk_ca_state >= TCP_CA_Recovery)
                return false;

        if ((tp->snd_cwnd > tcp_packets_in_flight(tp)) &&

> Btw, thank you very much for all the help! It is greatly appreciated :)

You are very welcome! :-)

cheers,
neal