netdev - RE: e1000 full-duplex TCP performance well below wire speed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 30 Jan 2008 17:07:28 -0600 (CST)
From:	Bruce Allen <ballen@...vity.phys.uwm.edu>
To:	"Brandeburg, Jesse" <jesse.brandeburg@...el.com>
cc:	netdev@...r.kernel.org,
	Carsten Aulbert <carsten.aulbert@....mpg.de>,
	Henning Fehrmann <henning.fehrmann@....mpg.de>,
	Bruce Allen <bruce.allen@....mpg.de>
Subject: RE: e1000 full-duplex TCP performance well below wire speed

Hi Jesse,

It's good to be talking directly to one of the e1000 developers and 
maintainers.  Although at this point I am starting to think that the 
issue may be TCP stack related and nothing to do with the NIC.  Am I 
correct that these are quite distinct parts of the kernel?

> The 82573L (a client NIC, regardless of the class of machine it is in)
> only has a x1 connection which does introduce some latency since the
> slot is only capable of about 2Gb/s data total, which includes overhead
> of descriptors and other transactions.  As you approach the maximum of
> the slot it gets more and more difficult to get wire speed in a
> bidirectional test.

According to the Intel datasheet, the PCI-e x1 connection is 2Gb/s in each 
direction.  So we only need to get up to 50% of peak to saturate a 
full-duplex wire-speed link.  I hope that the overhead is not a factor of 
two.

Important note: we ARE able to get full duplex wire speed (over 900 Mb/s 
simulaneously in both directions) using UDP.  The problems occur only with 
TCP connections.

>> The test was done with various mtu sizes ranging from 1500 to 9000,
>> with ethernet flow control switched on and off, and using reno and
>> cubic as a TCP congestion control.
>
> As asked in LKML thread, please post the exact netperf command used to
> start the client/server, whether or not you're using irqbalanced (aka
> irqbalance) and what cat /proc/interrupts looks like (you ARE using MSI,
> right?)

I have to wait until Carsten or Henning wake up tomorrow (now 23:38 in 
Germany).  So we'll provide this info in ~10 hours.

I assume that the interrupt load is distributed among all four cores -- 
the default affinity is 0xff, and I also assume that there is some type of 
interrupt aggregation taking place in the driver.  If the CPUs were not 
able to service the interrupts fast enough, I assume that we would also 
see loss of performance with UDP testing.

> I've recently discovered that particularly with the most recent kernels
> if you specify any socket options (-- -SX -sY) to netperf it does worse
> than if it just lets the kernel auto-tune.

I am pretty sure that no socket options were specified, but again need to 
wait until Carsten or Henning come back on-line.

>> The behavior depends on the setup. In one test we used cubic
>> congestion control, flow control off. The transfer rate in one
>> direction was above 0.9Gb/s while in the other direction it was 0.6
>> to 0.8 Gb/s. After 15-20s the rates flipped. Perhaps the two steams
>> are fighting for resources. (The performance of a full duplex stream
>> should be close to 1Gb/s in both directions.)  A graph of the
>> transfer speed as a function of time is here:
>> https://n0.aei.uni-hannover.de/networktest/node19-new20-noflow.jpg
>> Red shows transmit and green shows receive (please ignore other
>> plots):

> One other thing you can try with e1000 is disabling the dynamic
> interrupt moderation by loading the driver with
> InterruptThrottleRate=8000,8000,... (the number of commas depends on
> your number of ports) which might help in your particular benchmark.

OK.  Is 'dynamic interrupt moderation' another name for 'interrupt 
aggregation'?  Meaning that if more than one interrupt is generated in a 
given time interval, then they are replaced by a single interrupt?

> just for completeness can you post the dump of ethtool -e eth0 and lspci
> -vvv?

Yup, we'll give that info also.

Thanks again!

Cheers,
 	Bruce
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html