[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1423230001.31870.128.camel@edumazet-glaptop2.roam.corp.google.com>
Date: Fri, 06 Feb 2015 05:40:01 -0800
From: Eric Dumazet <eric.dumazet@...il.com>
To: Michal Kazior <michal.kazior@...to.com>
Cc: Neal Cardwell <ncardwell@...gle.com>,
linux-wireless <linux-wireless@...r.kernel.org>,
Network Development <netdev@...r.kernel.org>,
eyalpe@....mellanox.co.il
Subject: Re: Throughput regression with `tcp: refine TSO autosizing`
On Fri, 2015-02-06 at 10:42 +0100, Michal Kazior wrote:
> The above brings back previous behaviour, i.e. I can get 600mbps TCP
> on 5 flows again. Single flow is still (as it was before TSO
> autosizing) limited to roughly ~280mbps.
>
> I never really bothered before to understand why I need to push a few
> flows through ath10k to max it out, i.e. if I run a single UDP flow I
> get ~300mbps while with, e.g. 5 I get 670mbps easily.
>
For single UDP flow, tweaking /proc/sys/net/core/wmem_default might be
enough : UDP has no callback from TX completion to feed following frames
(No write queue like TCP)
# cat /proc/sys/net/core/wmem_default
212992
# ethtool -C eth1 tx-usecs 1024 tx-frames 120
# ./netperf -H remote -t UDP_STREAM -- -m 1450
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
212992 1450 10.00 697705 0 809.27
212992 10.00 673412 781.09
# echo 800000 >/proc/sys/net/core/wmem_default
# ./netperf -H remote -t UDP_STREAM -- -m 1450
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
800000 1450 10.00 7329221 0 8501.84
212992 10.00 7284051 8449.44
> I guess it was the tx completion latency all along.
>
> I just put an extra debug to ath10k to see the latency between
> submission and completion. Here's a log
> (http://www.filedropper.com/complete-log) of 2s run of UDP iperf
> trying to push 1gbps but managing only 300mbps.
>
> I've made sure to not hold any locks nor introduce internal to ath10k
> delays. Frames get completed between 2-4ms in avarage during load.
tcp_wfree() could maintain in tp->tx_completion_delay_ms an EWMA
of TX completion delay. But this would require yet another expensive
call to ktime_get() if HZ < 1000.
Then tcp_write_xmit() could use it to adjust :
limit = max(2 * skb->truesize, sk->sk_pacing_rate >> 9);
to
amount = (2 + tp->tx_completion_delay_ms) * sk->sk_pacing_rate
limit = max(2 * skb->truesize, amount / 1000);
I'll cook a patch.
Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists