netdev - Re: [PATCH v2 net-next 4/4] tcp: implement coalescing on backlog queue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CADVnQym2b722Psew=4sgMhabyDWvv6YdSsk9q=6M09JXXSYJqA@mail.gmail.com>
Date:   Tue, 27 Nov 2018 16:58:18 -0500
From:   Neal Cardwell <ncardwell@...gle.com>
To:     Eric Dumazet <edumazet@...gle.com>
Cc:     David Miller <davem@...emloft.net>,
        Netdev <netdev@...r.kernel.org>, jean-louis@...ond.be,
        Yuchung Cheng <ycheng@...gle.com>,
        Eric Dumazet <eric.dumazet@...il.com>
Subject: Re: [PATCH v2 net-next 4/4] tcp: implement coalescing on backlog queue

On Tue, Nov 27, 2018 at 10:57 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> In case GRO is not as efficient as it should be or disabled,
> we might have a user thread trapped in __release_sock() while
> softirq handler flood packets up to the point we have to drop.
>
> This patch balances work done from user thread and softirq,
> to give more chances to __release_sock() to complete its work
> before new packets are added the the backlog.
>
> This also helps if we receive many ACK packets, since GRO
> does not aggregate them.
>
> This patch brings ~60% throughput increase on a receiver
> without GRO, but the spectacular gain is really on
> 1000x release_sock() latency reduction I have measured.
>
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> Cc: Neal Cardwell <ncardwell@...gle.com>
> Cc: Yuchung Cheng <ycheng@...gle.com>
> ---
...
> +       if (TCP_SKB_CB(tail)->end_seq != TCP_SKB_CB(skb)->seq ||
> +           TCP_SKB_CB(tail)->ip_dsfield != TCP_SKB_CB(skb)->ip_dsfield ||
> +#ifdef CONFIG_TLS_DEVICE
> +           tail->decrypted != skb->decrypted ||
> +#endif
> +           thtail->doff != th->doff ||
> +           memcmp(thtail + 1, th + 1, hdrlen - sizeof(*th)))
> +               goto no_coalesce;
> +
> +       __skb_pull(skb, hdrlen);
> +       if (skb_try_coalesce(tail, skb, &fragstolen, &delta)) {
> +               thtail->window = th->window;
> +
> +               TCP_SKB_CB(tail)->end_seq = TCP_SKB_CB(skb)->end_seq;
> +
> +               if (after(TCP_SKB_CB(skb)->ack_seq, TCP_SKB_CB(tail)->ack_seq))
> +                       TCP_SKB_CB(tail)->ack_seq = TCP_SKB_CB(skb)->ack_seq;
> +
> +               TCP_SKB_CB(tail)->tcp_flags |= TCP_SKB_CB(skb)->tcp_flags;

I wonder if technically perhaps the logic should skip coalescing if
the tail or skb has the TCP_FLAG_URG bit set? It seems if skbs are
coalesced, and some have urgent data and some do not, then the
TCP_FLAG_URG bit will be accumulated into the tail header, but there
will be no way to ensure the correct urgent offsets for the one or
more skbs with urgent data are passed along.

Thinking out loud, I guess if this is ECN/DCTCP and some ACKs have
TCP_FLAG_ECE and some don't, this will effectively have all ACKed
bytes be treated as ECN-marked. Probably OK, since if this coalescing
path is being hit the sender may be overloaded and slowing down might
be a good thing.

Otherwise, looks great to me. Thanks for doing this!

neal