netdev - RE: e1000 full-duplex TCP performance well below wire speed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.63.0801310213010.3240@trinity.phys.uwm.edu>
Date:	Thu, 31 Jan 2008 02:31:25 -0600 (CST)
From:	Bruce Allen <ballen@...vity.phys.uwm.edu>
To:	"Brandeburg, Jesse" <jesse.brandeburg@...el.com>
cc:	netdev@...r.kernel.org,
	Carsten Aulbert <carsten.aulbert@....mpg.de>,
	Henning Fehrmann <henning.fehrmann@....mpg.de>,
	Bruce Allen <bruce.allen@....mpg.de>
Subject: RE: e1000 full-duplex TCP performance well below wire speed

Hi Jesse,

>> It's good to be talking directly to one of the e1000 developers and
>> maintainers.  Although at this point I am starting to think that the
>> issue may be TCP stack related and nothing to do with the NIC.  Am I
>> correct that these are quite distinct parts of the kernel?
>
> Yes, quite.

OK.  I hope that there is also someone knowledgable about the TCP stack 
who is following this thread. (Perhaps you also know this part of the 
kernel, but I am assuming that your expertise is on the e1000/NIC bits.)

>> Important note: we ARE able to get full duplex wire speed (over 900
>> Mb/s simulaneously in both directions) using UDP.  The problems occur
>> only with TCP connections.
>
> That eliminates bus bandwidth issues, probably, but small packets take
> up a lot of extra descriptors, bus bandwidth, CPU, and cache resources.

I see.  Your concern is the extra ACK packets associated with TCP.  Even 
those these represent a small volume of data (around 5% with MTU=1500, and 
less at larger MTU) they double the number of packets that must be handled 
by the system compared to UDP transmission at the same data rate. Is that 
correct?

>> I have to wait until Carsten or Henning wake up tomorrow (now 23:38 in
>> Germany).  So we'll provide this info in ~10 hours.
>
> I would suggest you try TCP_RR with a command line something like this:
> netperf -t TCP_RR -H <hostname> -C -c -- -b 4 -r 64K
>
> I think you'll have to compile netperf with burst mode support enabled.

I just saw Carsten a few minutes ago.  He has to take part in a 
'Baubesprechung' meeting this morning, after which he will start answering 
the technical questions and doing additional testing as suggested by you 
and others.  If you are on the US west coast, he should have some answers 
and results posted by Thursday morning Pacific time.

>> I assume that the interrupt load is distributed among all four cores
>> -- the default affinity is 0xff, and I also assume that there is some
>> type of interrupt aggregation taking place in the driver.  If the
>> CPUs were not able to service the interrupts fast enough, I assume
>> that we would also see loss of performance with UDP testing.
>>
>>> One other thing you can try with e1000 is disabling the dynamic
>>> interrupt moderation by loading the driver with
>>> InterruptThrottleRate=8000,8000,... (the number of commas depends on
>>> your number of ports) which might help in your particular benchmark.
>>
>> OK.  Is 'dynamic interrupt moderation' another name for 'interrupt
>> aggregation'?  Meaning that if more than one interrupt is generated
>> in a given time interval, then they are replaced by a single
>> interrupt?
>
> Yes, InterruptThrottleRate=8000 means there will be no more than 8000
> ints/second from that adapter, and if interrupts are generated faster
> than that they are "aggregated."
>
> Interestingly since you are interested in ultra low latency, and may be
> willing to give up some cpu for it during bulk transfers you should try
> InterruptThrottleRate=1 (can generate up to 70000 ints/s)

I'm not sure it's quite right to say that we are interested in ultra low 
latency. Most of our network transfers involve bulk data movement (a few 
MB or more).  We don't care so much about low latency (meaning how long it 
takes the FIRST byte of data to travel from sender to receiver).  We care 
about aggregate bandwidth: once the pipe is full, how fast can data be 
moved through it. Sow we don't care so much if getting the pipe full takes 
20 us or 50 us.  We just want the data to flow fast once the pipe IS full.

> Welcome, its an interesting discussion.  Hope we can come to a good
> conclusion.

Thank you. Carsten will post more info and answers later today.

Cheers,
 	Bruce
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html