netdev - Re: Throughput regression with `tcp: refine TSO autosizing`

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+BoTQko60aGP7QG+ELrgRmU=OFBZVc9n1X-ecGX2rLBnULSTA@mail.gmail.com>
Date:	Tue, 3 Feb 2015 10:00:33 +0100
From:	Michal Kazior <michal.kazior@...to.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	Ben Greear <greearb@...delatech.com>,
	linux-wireless <linux-wireless@...r.kernel.org>,
	Network Development <netdev@...r.kernel.org>,
	eyalpe@....mellanox.co.il
Subject: Re: Throughput regression with `tcp: refine TSO autosizing`

On 3 February 2015 at 00:06, Eric Dumazet <eric.dumazet@...il.com> wrote:
> On Mon, 2015-02-02 at 13:25 -0800, Ben Greear wrote:
>
>> It is a big throughput win to have fewer TCP ack packets on
>> wireless since it is a half-duplex environment.  Is there anything
>> we could improve so that we can have fewer acks and still get
>> good tcp stack behaviour?
>
> First apply TCP stretch ack fixes to the sender. There is no way to get
> good performance if the sender does not handle stretch ack.
>
> d6b1a8a92a14 tcp: fix timing issue in CUBIC slope calculation
> 9cd981dcf174 tcp: fix stretch ACK bugs in CUBIC
> c22bdca94782 tcp: fix stretch ACK bugs in Reno
> 814d488c6126 tcp: fix the timid additive increase on stretch ACKs
> e73ebb0881ea tcp: stretch ACK fixes prep
>
> Then, make sure you do not throttle ACK too long, especially if you hope
> to get Gbit line rate on a 4 ms RTT flow.
>
> GRO does not mean : send one ACK every ms, or after 3ms delay...

I think it's worth pointing out that If you assume 3-frame A-MSDU and
64-frame A-MPDU you get 192 frames (as far as TCP/IP is concerned) per
aggregation window. Assuming effective 600mbps throughput:

 python> 1.0/((((600/8)*1024*1024)/1500)/(3*64))
 0.003663003663003663

This is probably worst case, but still probably worth to keep in mind.

ath10k has a knob to tune A-MSDU aggregation count. The default is "3"
and it's what I've been testing so far.

When I change it to "1" on sender I get 250->400mbps boost in TCP -P5
but see no difference with -P1 (number of flows). Changing it to "1"
on receiver yields no difference. I can try adding this configuration
permutation to my future tests if you're interested.

So that you have an idea - using "1" on sender degrades UDP throughput
(even 690->500mbps in some cases).

Michał
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html