lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 10 Jan 2008 12:52:15 -0800
From:	"Brandeburg, Jesse" <jesse.brandeburg@...el.com>
To:	"Breno Leitao" <leitao@...ux.vnet.ibm.com>
Cc:	<netdev@...r.kernel.org>,
	"Brandeburg, Jesse" <jesse.brandeburg@...el.com>
Subject: RE: e1000 performance issue in 4 simultaneous links

Breno Leitao wrote:
> When I run netperf in just one interface, I get 940.95 * 10^6 bits/sec
> of transfer rate. If I run 4 netperf against 4 different interfaces, I
> get around 720 * 10^6 bits/sec.

This is actually a known issue that we have worked with your company
before on.  It comes down to your system's default behavior of round
robining interrupts (see cat /proc/interrupts while running the test)
combined with e1000's way of exiting / rescheduling NAPI.

The default round robin behavior of the interrupts on your system is the
root cause of this issue, and here is what happens:

4 interfaces start generating interrupts, if you're lucky the round
robin balancer has them all on different cpus.
As the e1000 driver goes into and out of polling mode, the round robin
balancer keeps moving the interrupt to the next cpu.
Eventually 2 or more driver instances end up on the same CPU, which
causes both driver instances to stay in NAPI polling mode, due to the
amount of work being done, and that there are always more than
"netdev->weight" packets to do for each instance.  This keeps *hardware*
interrupts for each interface *disabled*.
Staying in NAPI polling mode causes higher cpu utilization on that one
processor, which guarantees that when the hardware round robin balancer
moves any other network interrupt onto that CPU, it too will join the
NAPI polling mode chain.
So no matter how many processors you have, with this round robin style
of hardware interrupts, it guarantees you that if there is a lot of work
to do (more than weight) at each softirq, then, all network interfaces
will end up on the same cpu eventually (the busiest one)
Your performance becomes the same as if you had booted with maxcpus=1

I hope this explanation makes sense, but what it comes down to is that
combining hardware round robin balancing with NAPI is a BAD IDEA.  In
general the behavior of hardware round robin balancing is bad and I'm
sure it is causing all sorts of other performance issues that you may
not even be aware of.

I'm sure your problem will go away if you run e1000 in interrupt mode.
(use make CFLAGS_EXTRA=-DE1000_NO_NAPI)
 
> If I run the same test against 2 interfaces I get a 940 * 10^6
> bits/sec transfer rate also, and if I run it against 3 interfaces I
> get around 850 * 10^6 bits/sec performance.
> 
> I got this results using the upstream netdev-2.6 branch kernel plus
> David Miller's 7 NAPI patches set[1]. In the kernel 2.6.23.12 the
> result is a bit worse, and the the transfer rate was around 600 * 10^6
> bits/sec.

Thank you for testing the latest kernel.org kernel.

Hope this helps.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ