netdev - Re: BW regression after "tcp: refine TSO autosizing"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1421723651.17892.3.camel@edumazet-glaptop2.roam.corp.google.com>
Date:	Mon, 19 Jan 2015 19:14:11 -0800
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Dave Taht <dave.taht@...il.com>
Cc:	Eyal Perry <eyalpe@...lanox.com>,
	Yuchung Cheng <ycheng@...gle.com>,
	Neal Cardwell <ncardwell@...gle.com>,
	Eyal Perry <eyalpe@....mellanox.co.il>,
	Or Gerlitz <gerlitz.or@...il.com>,
	Linux Netdev List <netdev@...r.kernel.org>,
	Amir Vadai <amirv@...lanox.com>,
	Yevgeny Petrilin <yevgenyp@...lanox.com>,
	Saeed Mahameed <saeedm@...lanox.com>,
	Ido Shamay <idos@...lanox.com>, Amir Ancel <amira@...lanox.com>
Subject: Re: BW regression after "tcp: refine TSO autosizing"

On Mon, 2015-01-19 at 18:37 -0800, Dave Taht wrote:
> On Mon, Jan 19, 2015 at 6:16 PM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> > On Sun, 2015-01-18 at 23:40 +0200, Eyal Perry wrote:
> >
> >> So indeed, interrupt mitigation (tx-usecs 1 tx-frames 1) improves things up
> >> for the "refined TSO autosizing" kernel (from 18.4Gbps to 19.7Gbps). but
> >> in the
> >> other kernel, the BW is remains the same with and without the coalescing.
> >
> > OK thanks for testing.
> >
> > I believe the regression comes from inability for cc to cope with
> > stretch acks.
> >
> > Nowadays on fast networks, each ACK packet acknowledges ~45 MSS, but
> > CUBIC (and others cc) got support for this only during slow start, with
> > commit 9f9843a751d0a2057f9f3d313886e7e5e6ebaac9
> > ("tcp: properly handle stretch acks in slow start")
> >
> > I guess it is time to also handle congestion avoidance phase.
> 
> Are you saying that at long last, delayed acks as we knew them are
> dead, dead, dead?

Sorry, I can not parse what you are saying.

In case you missed it, it has nothing to do with delayed ACK but GRO on
receiver.


> 
> > With following patch (very close to what we use here at Google) I
> > reached 37Gbps instead of 20Gbps :
> >
> > ethtool -C eth1 tx-usecs 4 tx-frames 4
> 
> What is the default here?

16 & 16, see my prior answer in this thread.

> 
> What happens with the default here?

ethtool -C eth1 tx-usecs 16 tx-frames 16
DUMP_TCP_INFO=1 ./netperf -H remote -T2,2 -t TCP_STREAM -l 20
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to remote
() port 0 AF_INET : cpu bind
rto=201000 ato=0 pmtu=1500 rcv_ssthresh=29200 rtt=60 rttvar=2
snd_ssthresh=179 cwnd=243 reordering=3 total_retrans=23 ca_state=0
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    20.00    22923.74   




> 
> >
> > DUMP_TCP_INFO=1 ./netperf -H remote -T2,2 -t TCP_STREAM -l 20
> > MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to remote () port 0 AF_INET : cpu bind
> > rto=201000 ato=0 pmtu=1500 rcv_ssthresh=29200 rtt=67 rttvar=6 snd_ssthresh=263 cwnd=265 reordering=3 total_retrans=4569 ca_state=0
> 
> The above statistics are not dumped by my netperf, and look extremely
> desirable to capture in netperf-wrapper. This is a script parsing some
> other kernel data at the conclusion of the run? or a better netperf?

Thats a 3 lines patch in netperf actually.

> 
> If ECN was on the bottleneck link, I imagine total_retrans would be 0,
> or are packets getting dropped in the kernel?

The receiver drops frames, because we are at the limit of what the NIC
can do on a single RX queue.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html