netdev - Re: e1000 full-duplex TCP performance well below wire speed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.63.0801311251040.14403@trinity.phys.uwm.edu>
Date:	Thu, 31 Jan 2008 13:13:57 -0600 (CST)
From:	Bruce Allen <ballen@...vity.phys.uwm.edu>
To:	"Kok, Auke" <auke-jan.h.kok@...el.com>
cc:	"Brandeburg, Jesse" <jesse.brandeburg@...el.com>,
	netdev@...r.kernel.org,
	Carsten Aulbert <carsten.aulbert@....mpg.de>,
	Henning Fehrmann <henning.fehrmann@....mpg.de>,
	Bruce Allen <bruce.allen@....mpg.de>
Subject: Re: e1000 full-duplex TCP performance well below wire speed

Hi Auke,

>>>> Important note: we ARE able to get full duplex wire speed (over 900
>>>> Mb/s simulaneously in both directions) using UDP.  The problems occur
>>>> only with TCP connections.
>>>
>>> That eliminates bus bandwidth issues, probably, but small packets take
>>> up a lot of extra descriptors, bus bandwidth, CPU, and cache resources.
>>
>> I see.  Your concern is the extra ACK packets associated with TCP.  Even
>> those these represent a small volume of data (around 5% with MTU=1500,
>> and less at larger MTU) they double the number of packets that must be
>> handled by the system compared to UDP transmission at the same data
>> rate. Is that correct?
>
> A lot of people tend to forget that the pci-express bus has enough 
> bandwidth on first glance - 2.5gbit/sec for 1gbit of traffix, but apart 
> from data going over it there is significant overhead going on: each 
> packet requires transmit, cleanup and buffer transactions, and there are 
> many irq register clears per second (slow ioread/writes). The 
> transactions double for TCP ack processing, and this all accumulates and 
> starts to introduce latency, higher cpu utilization etc...

Based on the discussion in this thread, I am inclined to believe that lack 
of PCI-e bus bandwidth is NOT the issue.  The theory is that the extra 
packet handling associated with TCP acknowledgements are pushing the PCI-e 
x1 bus past its limits.  However the evidence seems to show otherwise:

(1) Bill Fink has reported the same problem on a NIC with a 133 MHz 64-bit 
PCI connection.  That connection can transfer data at 8Gb/s.

(2) If the theory is right, then doubling the MTU from 1500 to 3000 should 
have significantly reduce the problem, since it drops the number of ACK's 
by two.  Similarly, going from MTU 1500 to MTU 9000 should reduce the 
number of ACK's by a factor of six, practically eliminating the problem. 
But changing the MTU size does not help.

(3) The interrupt counts are quite reasonable.  Broadcom NICs without 
interrupt aggregation generate an order of magnitude more irq/s and this 
doesn't prevent wire speed performance there.

(4) The CPUs on the system are largely idle.  There are plenty of 
computing resources available.

(5) I don't think that the overhead will increase the bandwidth needed by 
more than a factor of two.  Of course you and the other e1000 developers 
are the experts, but the dominant bus cost should be copying data buffers 
across the bus. Everything else in minimal in comparison.

Intel insiders: isn't there some simple instrumentation available (which 
read registers or statistics counters on the PCI-e interface chip) to tell 
us statistics such as how many bits have moved over the link in each 
direction? This plus some accurate timing would make it easy to see if the 
TCP case is saturating the PCI-e bus.  Then the theory addressed with data 
rather than with opinions.

Cheers,
 	Bruce
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html