netdev - Re: Question about LRO/GRO and TCP acknowledgements

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1307987726.8149.3312.camel@tardy>
Date:	Mon, 13 Jun 2011 10:55:26 -0700
From:	Rick Jones <rick.jones2@...com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	Joris van Rantwijk <joris@...isvr.nl>, netdev@...r.kernel.org
Subject: Re: Question about LRO/GRO and TCP acknowledgements

On Sun, 2011-06-12 at 16:57 +0200, Eric Dumazet wrote:
> Le dimanche 12 juin 2011 à 13:24 +0200, Joris van Rantwijk a écrit :
> > On 2011-06-12, Eric Dumazet <eric.dumazet@...il.com> wrote:
> > > So your concern is more a Sender side implementation missing this
> > > recommendation, not GRO per se...
> > 
> > Not really. The same RFC says:
> >   Specifically, an ACK SHOULD be generated for at least every
> >   second full-sized segment, ...
> > 
> 
> Well, SHOULD is not MUST.
> 
> 
> > I can see how the world may have been a better place if every sender
> > implemented Appropriate Byte Counting and TCP receivers were allowed to
> > send fewer ACKs. However, current reality is that ABC is optional,
> > disabled by default in Linux, and receivers are recommended to send one
> > ACK per two segments.
> > 
> 
> ABC might be nice for stacks that use byte counters for cwnd. We use
> segments.
> 
> > I suspect that GRO currently hurts throughput of isolated TCP
> > connections. This is based on a purely theoretic argument. I may be
> > wrong and I have absolutely no data to confirm my suspicion.
> > 
> > If you can point out the flaw in my reasoning, I would be greatly
> > relieved. Until then, I remain concerned that there may be something
> > wrong with GRO and TCP ACKs.
> 
> Think of GRO being a receiver facility against stress/load, typically in
> datacenter.
> 
> Only when receiver is overloaded, GRO kicks in and can coalesce several
> frames before being handled in TCP stack in one run.

How is that affected by interrupt coalescing in the NIC and the sending
side doing TSO (and so, ostensibly sending back-to-back frames)?  Are we
assured that a NIC is updating its completion pointer on the rx ring
continuously rather than just before a coalesced interrupt?

Does GRO "never" kick-in over a 1GbE link (making the handwaving
assumption that cores today are >> faster than a 1GbE link on a bulk
transfer).

It was just a quick and dirty test, but it does seem there is a positive
hit from GRO being enabled on a 1GbE link on a system with "fast
processors"

raj@...dy:~/netperf2_trunk$ sudo ethtool -K eth1 gro off
raj@...dy:~/netperf2_trunk$ src/netperf -t TCP_MAERTS -H 192.168.1.3 -i
10,3 -c -- -k foo
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
192.168.1.3 (192.168.1.3) port 0 AF_INET : +/-2.500% @ 99% conf.  :
histogram : demo
THROUGHPUT=935.07
LOCAL_INTERFACE_NAME=eth1
LOCAL_CPU_UTIL=16.64
LOCAL_SD=5.830
raj@...dy:~/netperf2_trunk$ sudo ethtool -K eth1 gro on
raj@...dy:~/netperf2_trunk$ src/netperf -t TCP_MAERTS -H 192.168.1.3 -i
10,3 -c -- -k foo
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
192.168.1.3 (192.168.1.3) port 0 AF_INET : +/-2.500% @ 99% conf.  :
histogram : demo
THROUGHPUT=934.81
LOCAL_INTERFACE_NAME=eth1
LOCAL_CPU_UTIL=16.21
LOCAL_SD=5.684
raj@...dy:~/netperf2_trunk$ uname -a
Linux tardy 2.6.35-28-generic #50-Ubuntu SMP Fri Mar 18 18:42:20 UTC
2011 x86_64 GNU/Linux

The receiver system here has a 3.07 GHz W3550 in it and eth1 is a port
on an Intel 82571EB-based four-port card.

raj@...dy:~/netperf2_trunk$ ethtool -i eth1
driver: e1000e
version: 1.0.2-k4
firmware-version: 5.10-2
bus-info: 0000:2a:00.0

> If receiver is so loaded that more than 2 frames are coalesced in a NAPI
> run, it certainly helps to not allow sender to increase its cwnd more
> than one SMSS. We probably are right before packet drops anyway.

If we are indeed statistically certain we are right before packet drops
(or I suppose asserting pause) then shouldn't ECN get set by the GRO
code?

rick

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html