[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1423141038.31870.38.camel@edumazet-glaptop2.roam.corp.google.com>
Date: Thu, 05 Feb 2015 04:57:18 -0800
From: Eric Dumazet <eric.dumazet@...il.com>
To: Michal Kazior <michal.kazior@...to.com>
Cc: Neal Cardwell <ncardwell@...gle.com>,
linux-wireless <linux-wireless@...r.kernel.org>,
Network Development <netdev@...r.kernel.org>,
eyalpe@....mellanox.co.il
Subject: Re: Throughput regression with `tcp: refine TSO autosizing`
On Thu, 2015-02-05 at 09:38 +0100, Michal Kazior wrote:
> On 4 February 2015 at 22:11, Eric Dumazet <eric.dumazet@...il.com> wrote:
> > I do not see how a TSO patch could hurt a flow not using TSO/GSO.
> >
> > This makes no sense.
>
> Hmm..
>
> @@ -2018,8 +2053,8 @@ static bool tcp_write_xmit(struct sock *sk,
> unsigned int mss_now, int nonagle,
> * of queued bytes to ensure line rate.
> * One example is wifi aggregation (802.11 AMPDU)
> */
> - limit = max_t(unsigned int, sysctl_tcp_limit_output_bytes,
> - sk->sk_pacing_rate >> 10);
> + limit = max(2 * skb->truesize, sk->sk_pacing_rate >> 10);
> + limit = min_t(u32, limit, sysctl_tcp_limit_output_bytes);
>
> if (atomic_read(&sk->sk_wmem_alloc) > limit) {
> set_bit(TSQ_THROTTLED, &tp->tsq_flags);
>
> Doesn't this effectively invert how tcp_limit_output_bytes is used?
> This would explain why raising the limit wasn't changing anything
> anymore when you asked me do so. Only decreasing it yielded any
> change.
>
> I've added a printk to show up the new and old values. Excerpt from logs:
>
> [ 114.782740] (4608 39126 131072 = 39126) vs (131072 39126 = 131072)
>
> (2*truesize, pacing_rate, tcp_limit = limit) vs (tcp_limit, pacing_rate = limit)
>
> Reverting this patch hunk alone fixes my TCP problem. Not that I'm
> saying the old logic was correct (it seems it wasn't, a limit should
> be applied as min(value, max_value), right?).
>
> Anyway the change doesn't seem to be TSO-only oriented so it would
> explain the "makes no sense".
The intention is to control the queues to the following :
1 ms of buffering, but limited to a configurable value.
On a 40Gbps flow, 1ms represents 5 MB, which is insane.
We do not want to queue 5 MB of traffic, this would destroy latencies
for all concurrent flows. (Or would require having fq_codel or fq as
packet schedulers, instead of default pfifo_fast)
This is why having 1.5 ms delay between the transmit and TX completion
is a problem in your case.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists