netdev - Re: [PATCH net-next] tcp: refine TSO autosizing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CADVnQynB32C8vWqX-Tem-GpHuSs+AsG6s8_M1Og=ru1cFVNBcw@mail.gmail.com>
Date:	Fri, 5 Dec 2014 10:32:33 -0500
From:	Neal Cardwell <ncardwell@...gle.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	David Miller <davem@...emloft.net>,
	netdev <netdev@...r.kernel.org>,
	Yuchung Cheng <ycheng@...gle.com>,
	Nandita Dukkipati <nanditad@...gle.com>
Subject: Re: [PATCH net-next] tcp: refine TSO autosizing

On Fri, Dec 5, 2014 at 9:15 AM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> From: Eric Dumazet <edumazet@...gle.com>
>
> Commit 95bd09eb2750 ("tcp: TSO packets automatic sizing") tried to
> control TSO size, but did this at the wrong place (sendmsg() time)
>
> At sendmsg() time, we might have a pessimistic view of flow rate,
> and we end up building very small skbs (with 2 MSS per skb).
>
> This is bad because :
>
>  - It sends small TSO packets even in Slow Start where rate quickly
>    increases.
>  - It tends to make socket write queue very big, increasing tcp_ack()
>    processing time, but also increasing memory needs, not necessarily
>    accounted for, as fast clones overhead is currently ignored.
>  - Lower GRO efficiency and more ACK packets.
>
> Servers with a lot of small lived connections suffer from this.
>
> Lets instead fill skbs as much as possible (64KB of payload), but split
> them at xmit time, when we have a precise idea of the flow rate.
> skb split is actually quite efficient.

Nice. I definitely agree this is the right direction.

However, from my experience testing a variant of this approach, this
kind of late decision about packet size was sometimes causing
performance shortfalls on long-RTT, medium-bandwidth paths unless
tcp_tso_should_defer() was also modified to use the new/smaller packet
size goal.

The issue is that tcp_tso_should_defer() uses tp->xmit_size_goal_segs
as a yardstick, and says, "hey, if cwnd and rwin allow us to send
tp->xmit_size_goal_segs * tp->mss_cache then let's go ahead and send
it now."

But if we remove the sendmsg-time autosizing logic that was tuning
tp->xmit_size_goal_segs, then tcp_tso_should_defer() is now going to
be waiting to try to accumulate permission to send a big skb with
tp->xmit_size_goal_segs (e.g. ~40) MSS in it.

In my tests I was able to fix this issue by making
tcp_tso_should_defer() use the latest size goal instead of
tp->xmit_size_goal_segs.

So, how about making the rate-based TSO autosizing goal (stored in
"segs" in this patch) at the top of tcp_write_xmit()? Then we could
pass in that segment goal to tcp_tso_should_defer() for use instead of
tp->xmit_size_goal_segs in deciding whether we have a big enough chunk
to send now. Similarly, that segment goal could be passed in to
tcp_mss_split_point instead of sk->sk_gso_max_segs.

(The autosizing calculation could be in a helper function to keep
tcp_write_xmit() manageable.)

neal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html