[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+BoTQn3KRym0qCJ+fLav5LQgjfiN_-Eqz0xPSc2eXJ-ijNjQw@mail.gmail.com>
Date: Mon, 9 Feb 2015 14:47:56 +0100
From: Michal Kazior <michal.kazior@...to.com>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: Neal Cardwell <ncardwell@...gle.com>,
linux-wireless <linux-wireless@...r.kernel.org>,
Network Development <netdev@...r.kernel.org>,
Eyal Perry <eyalpe@....mellanox.co.il>
Subject: Re: Throughput regression with `tcp: refine TSO autosizing`
On 6 February 2015 at 15:09, Michal Kazior <michal.kazior@...to.com> wrote:
> On 6 February 2015 at 14:53, Eric Dumazet <eric.dumazet@...il.com> wrote:
>> On Fri, 2015-02-06 at 05:40 -0800, Eric Dumazet wrote:
>>
>>> tcp_wfree() could maintain in tp->tx_completion_delay_ms an EWMA
>>> of TX completion delay. But this would require yet another expensive
>>> call to ktime_get() if HZ < 1000.
>>>
>>> Then tcp_write_xmit() could use it to adjust :
>>>
>>> limit = max(2 * skb->truesize, sk->sk_pacing_rate >> 9);
>>>
>>> to
>>>
>>> amount = (2 + tp->tx_completion_delay_ms) * sk->sk_pacing_rate
>>>
>>> limit = max(2 * skb->truesize, amount / 1000);
>>>
>>> I'll cook a patch.
>>
>> Hmm... doing this in all protocols would be too expensive,
>> and we do not want to include time spent in qdiscs.
>>
>> wifi could eventually do that, providing in skb->tx_completion_delay_us
>> the time spent in wifi driver.
>>
>> This way, we would have no penalty for network devices doing normal skb
>> orphaning (loopback interface, ethernet, ...)
>
> I'll play around with this idea and report back later.
I'm able to get 600mbps with 5 flows and 250mbps with 1 flow, i.e.
same as before the regression. I'm attaching the patch at the end of
my mail - is this approach viable?
I wonder if there's anything that can be done to allow 600mbps (line
rate) on 1 flow with ath10k without tweaking tcp_limit_output_bytes
(you can't expect end-users to tweak this).
Perhaps tcp_limit_output_bytes should also consider tx_completion_delay, e.g.:
amount = sk->sk_tx_completion_delay_us;
amount *= sk->sk_pacing_rate >> 10;
limit = max(2 * skb->truesize, amount >> 10);
max_limit = sysctl_tcp_limit_output_bytes;
max_limit *= 1 + (sk->sk_tx_completion_delay_us / USEC_PER_MSEC);
limit = min(u32, limit, max_limit);
With this I get ~400mbps on 1 flow. If I add the original 1ms extra
delay from your formula to tx_completion_delay I fill in ath10k I get
nearly line rate in 1 flow (almost 600mbps; it hops between 570-620).
Decreasing tcp_limit_output_bytes decreases throughput (e.g. 64K gives
300mbps, 32K gives 180mbps, 16K gives 110mbps). Multiple flows in
iperf seem unbalanced with 128K limit, but look okay with 32K).
MichaĆ
diff --git a/drivers/net/wireless/ath/ath10k/core.h
b/drivers/net/wireless/ath/ath10k/core.h
index 3be3a59..4ff0ae8 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -82,6 +82,7 @@ struct ath10k_skb_cb {
dma_addr_t paddr;
u8 eid;
u8 vdev_id;
+ ktime_t stamp;
struct {
u8 tid;
diff --git a/drivers/net/wireless/ath/ath10k/mac.c
b/drivers/net/wireless/ath/ath10k/mac.c
index 15e47f4..5efb2a7 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -2620,6 +2620,7 @@ static void ath10k_tx(struct ieee80211_hw *hw,
if (info->flags & IEEE80211_TX_CTL_NO_CCK_RATE)
ath10k_dbg(ar, ATH10K_DBG_MAC,
"IEEE80211_TX_CTL_NO_CCK_RATE\n");
+ ATH10K_SKB_CB(skb)->stamp = ktime_get();
ATH10K_SKB_CB(skb)->htt.is_offchan = false;
ATH10K_SKB_CB(skb)->htt.tid = ath10k_tx_h_get_tid(hdr);
ATH10K_SKB_CB(skb)->vdev_id = ath10k_tx_h_get_vdev_id(ar, vif);
diff --git a/drivers/net/wireless/ath/ath10k/txrx.c
b/drivers/net/wireless/ath/ath10k/txrx.c
index 3f00cec..0d5539b 100644
--- a/drivers/net/wireless/ath/ath10k/txrx.c
+++ b/drivers/net/wireless/ath/ath10k/txrx.c
@@ -15,6 +15,7 @@
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
+#include <net/sock.h>
#include "core.h"
#include "txrx.h"
#include "htt.h"
@@ -82,6 +83,13 @@ void ath10k_txrx_tx_unref(struct ath10k_htt *htt,
ath10k_report_offchan_tx(htt->ar, msdu);
+ if (msdu->sk) {
+ ACCESS_ONCE(msdu->sk->sk_tx_completion_delay_us) =
+ ktime_to_ns(ktime_sub(ktime_get(),
+ skb_cb->stamp)) /
+ NSEC_PER_USEC;
+ }
+
info = IEEE80211_SKB_CB(msdu);
memset(&info->status, 0, sizeof(info->status));
trace_ath10k_txrx_tx_unref(ar, tx_done->msdu_id);
diff --git a/include/net/sock.h b/include/net/sock.h
index 2210fec..6b15d71 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -390,6 +390,7 @@ struct sock {
int sk_wmem_queued;
gfp_t sk_allocation;
u32 sk_pacing_rate; /* bytes per second */
+ u32 sk_tx_completion_delay_us;
u32 sk_max_pacing_rate;
netdev_features_t sk_route_caps;
netdev_features_t sk_route_nocaps;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 65caf8b..5e249bf 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1996,6 +1996,7 @@ static bool tcp_write_xmit(struct sock *sk,
unsigned int mss_now, int nonagle,
max_segs = tcp_tso_autosize(sk, mss_now);
while ((skb = tcp_send_head(sk))) {
unsigned int limit;
+ unsigned int amount;
tso_segs = tcp_init_tso_segs(sk, skb, mss_now);
BUG_ON(!tso_segs);
@@ -2053,7 +2054,9 @@ static bool tcp_write_xmit(struct sock *sk,
unsigned int mss_now, int nonagle,
* of queued bytes to ensure line rate.
* One example is wifi aggregation (802.11 AMPDU)
*/
- limit = max(2 * skb->truesize, sk->sk_pacing_rate >> 10);
+ amount = sk->sk_tx_completion_delay_us *
+ (sk->sk_pacing_rate >> 10);
+ limit = max(2 * skb->truesize, amount >> 10);
limit = min_t(u32, limit, sysctl_tcp_limit_output_bytes);
if (atomic_read(&sk->sk_wmem_alloc) > limit) {
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists