[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.63.0801311251040.14403@trinity.phys.uwm.edu>
Date: Thu, 31 Jan 2008 13:13:57 -0600 (CST)
From: Bruce Allen <ballen@...vity.phys.uwm.edu>
To: "Kok, Auke" <auke-jan.h.kok@...el.com>
cc: "Brandeburg, Jesse" <jesse.brandeburg@...el.com>,
netdev@...r.kernel.org,
Carsten Aulbert <carsten.aulbert@....mpg.de>,
Henning Fehrmann <henning.fehrmann@....mpg.de>,
Bruce Allen <bruce.allen@....mpg.de>
Subject: Re: e1000 full-duplex TCP performance well below wire speed
Hi Auke,
>>>> Important note: we ARE able to get full duplex wire speed (over 900
>>>> Mb/s simulaneously in both directions) using UDP. The problems occur
>>>> only with TCP connections.
>>>
>>> That eliminates bus bandwidth issues, probably, but small packets take
>>> up a lot of extra descriptors, bus bandwidth, CPU, and cache resources.
>>
>> I see. Your concern is the extra ACK packets associated with TCP. Even
>> those these represent a small volume of data (around 5% with MTU=1500,
>> and less at larger MTU) they double the number of packets that must be
>> handled by the system compared to UDP transmission at the same data
>> rate. Is that correct?
>
> A lot of people tend to forget that the pci-express bus has enough
> bandwidth on first glance - 2.5gbit/sec for 1gbit of traffix, but apart
> from data going over it there is significant overhead going on: each
> packet requires transmit, cleanup and buffer transactions, and there are
> many irq register clears per second (slow ioread/writes). The
> transactions double for TCP ack processing, and this all accumulates and
> starts to introduce latency, higher cpu utilization etc...
Based on the discussion in this thread, I am inclined to believe that lack
of PCI-e bus bandwidth is NOT the issue. The theory is that the extra
packet handling associated with TCP acknowledgements are pushing the PCI-e
x1 bus past its limits. However the evidence seems to show otherwise:
(1) Bill Fink has reported the same problem on a NIC with a 133 MHz 64-bit
PCI connection. That connection can transfer data at 8Gb/s.
(2) If the theory is right, then doubling the MTU from 1500 to 3000 should
have significantly reduce the problem, since it drops the number of ACK's
by two. Similarly, going from MTU 1500 to MTU 9000 should reduce the
number of ACK's by a factor of six, practically eliminating the problem.
But changing the MTU size does not help.
(3) The interrupt counts are quite reasonable. Broadcom NICs without
interrupt aggregation generate an order of magnitude more irq/s and this
doesn't prevent wire speed performance there.
(4) The CPUs on the system are largely idle. There are plenty of
computing resources available.
(5) I don't think that the overhead will increase the bandwidth needed by
more than a factor of two. Of course you and the other e1000 developers
are the experts, but the dominant bus cost should be copying data buffers
across the bus. Everything else in minimal in comparison.
Intel insiders: isn't there some simple instrumentation available (which
read registers or statistics counters on the PCI-e interface chip) to tell
us statistics such as how many bits have moved over the link in each
direction? This plus some accurate timing would make it easy to see if the
TCP case is saturating the PCI-e bus. Then the theory addressed with data
rather than with opinions.
Cheers,
Bruce
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists