netdev - Re: Throughput regression with `tcp: refine TSO autosizing`

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1423141038.31870.38.camel@edumazet-glaptop2.roam.corp.google.com>
Date:	Thu, 05 Feb 2015 04:57:18 -0800
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Michal Kazior <michal.kazior@...to.com>
Cc:	Neal Cardwell <ncardwell@...gle.com>,
	linux-wireless <linux-wireless@...r.kernel.org>,
	Network Development <netdev@...r.kernel.org>,
	eyalpe@....mellanox.co.il
Subject: Re: Throughput regression with `tcp: refine TSO autosizing`

On Thu, 2015-02-05 at 09:38 +0100, Michal Kazior wrote:
> On 4 February 2015 at 22:11, Eric Dumazet <eric.dumazet@...il.com> wrote:
> > I do not see how a TSO patch could hurt a flow not using TSO/GSO.
> >
> > This makes no sense.
> 
> Hmm..
> 
> @@ -2018,8 +2053,8 @@ static bool tcp_write_xmit(struct sock *sk,
> unsigned int mss_now, int nonagle,
>                  * of queued bytes to ensure line rate.
>                  * One example is wifi aggregation (802.11 AMPDU)
>                  */
> -               limit = max_t(unsigned int, sysctl_tcp_limit_output_bytes,
> -                             sk->sk_pacing_rate >> 10);
> +               limit = max(2 * skb->truesize, sk->sk_pacing_rate >> 10);
> +               limit = min_t(u32, limit, sysctl_tcp_limit_output_bytes);
> 
>                 if (atomic_read(&sk->sk_wmem_alloc) > limit) {
>                         set_bit(TSQ_THROTTLED, &tp->tsq_flags);
> 
> Doesn't this effectively invert how tcp_limit_output_bytes is used?
> This would explain why raising the limit wasn't changing anything
> anymore when you asked me do so. Only decreasing it yielded any
> change.
> 
> I've added a printk to show up the new and old values. Excerpt from logs:
> 
> [  114.782740] (4608 39126 131072 = 39126) vs (131072 39126 = 131072)
> 
> (2*truesize, pacing_rate, tcp_limit = limit) vs (tcp_limit, pacing_rate = limit)
> 
> Reverting this patch hunk alone fixes my TCP problem. Not that I'm
> saying the old logic was correct (it seems it wasn't, a limit should
> be applied as min(value, max_value), right?).
> 
> Anyway the change doesn't seem to be TSO-only oriented so it would
> explain the "makes no sense".


The intention is to control the queues to the following :

1 ms of buffering, but limited to a configurable value.

On a 40Gbps flow, 1ms represents 5 MB, which is insane.

We do not want to queue 5 MB of traffic, this would destroy latencies
for all concurrent flows. (Or would require having fq_codel or fq as
packet schedulers, instead of default pfifo_fast)

This is why having 1.5 ms delay between the transmit and TX completion
is a problem in your case.

 



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html