[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.63.0801310213010.3240@trinity.phys.uwm.edu>
Date: Thu, 31 Jan 2008 02:31:25 -0600 (CST)
From: Bruce Allen <ballen@...vity.phys.uwm.edu>
To: "Brandeburg, Jesse" <jesse.brandeburg@...el.com>
cc: netdev@...r.kernel.org,
Carsten Aulbert <carsten.aulbert@....mpg.de>,
Henning Fehrmann <henning.fehrmann@....mpg.de>,
Bruce Allen <bruce.allen@....mpg.de>
Subject: RE: e1000 full-duplex TCP performance well below wire speed
Hi Jesse,
>> It's good to be talking directly to one of the e1000 developers and
>> maintainers. Although at this point I am starting to think that the
>> issue may be TCP stack related and nothing to do with the NIC. Am I
>> correct that these are quite distinct parts of the kernel?
>
> Yes, quite.
OK. I hope that there is also someone knowledgable about the TCP stack
who is following this thread. (Perhaps you also know this part of the
kernel, but I am assuming that your expertise is on the e1000/NIC bits.)
>> Important note: we ARE able to get full duplex wire speed (over 900
>> Mb/s simulaneously in both directions) using UDP. The problems occur
>> only with TCP connections.
>
> That eliminates bus bandwidth issues, probably, but small packets take
> up a lot of extra descriptors, bus bandwidth, CPU, and cache resources.
I see. Your concern is the extra ACK packets associated with TCP. Even
those these represent a small volume of data (around 5% with MTU=1500, and
less at larger MTU) they double the number of packets that must be handled
by the system compared to UDP transmission at the same data rate. Is that
correct?
>> I have to wait until Carsten or Henning wake up tomorrow (now 23:38 in
>> Germany). So we'll provide this info in ~10 hours.
>
> I would suggest you try TCP_RR with a command line something like this:
> netperf -t TCP_RR -H <hostname> -C -c -- -b 4 -r 64K
>
> I think you'll have to compile netperf with burst mode support enabled.
I just saw Carsten a few minutes ago. He has to take part in a
'Baubesprechung' meeting this morning, after which he will start answering
the technical questions and doing additional testing as suggested by you
and others. If you are on the US west coast, he should have some answers
and results posted by Thursday morning Pacific time.
>> I assume that the interrupt load is distributed among all four cores
>> -- the default affinity is 0xff, and I also assume that there is some
>> type of interrupt aggregation taking place in the driver. If the
>> CPUs were not able to service the interrupts fast enough, I assume
>> that we would also see loss of performance with UDP testing.
>>
>>> One other thing you can try with e1000 is disabling the dynamic
>>> interrupt moderation by loading the driver with
>>> InterruptThrottleRate=8000,8000,... (the number of commas depends on
>>> your number of ports) which might help in your particular benchmark.
>>
>> OK. Is 'dynamic interrupt moderation' another name for 'interrupt
>> aggregation'? Meaning that if more than one interrupt is generated
>> in a given time interval, then they are replaced by a single
>> interrupt?
>
> Yes, InterruptThrottleRate=8000 means there will be no more than 8000
> ints/second from that adapter, and if interrupts are generated faster
> than that they are "aggregated."
>
> Interestingly since you are interested in ultra low latency, and may be
> willing to give up some cpu for it during bulk transfers you should try
> InterruptThrottleRate=1 (can generate up to 70000 ints/s)
I'm not sure it's quite right to say that we are interested in ultra low
latency. Most of our network transfers involve bulk data movement (a few
MB or more). We don't care so much about low latency (meaning how long it
takes the FIRST byte of data to travel from sender to receiver). We care
about aggregate bandwidth: once the pipe is full, how fast can data be
moved through it. Sow we don't care so much if getting the pipe full takes
20 us or 50 us. We just want the data to flow fast once the pipe IS full.
> Welcome, its an interesting discussion. Hope we can come to a good
> conclusion.
Thank you. Carsten will post more info and answers later today.
Cheers,
Bruce
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists