lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 18 Jun 2008 12:18:30 -0700
From:	"Brandeburg, Jesse" <jesse.brandeburg@...el.com>
To:	<vatsa@...ux.vnet.ibm.com>, <linux-kernel@...r.kernel.org>,
	<e1000-devel@...ts.sourceforge.net>
Cc:	<greg@...ah.com>, <varunc@...ux.vnet.ibm.com>,
	<jbarnes@...tuousgeek.org>
Subject: RE: Strange problem with e1000 driver - ping packet loss

Srivatsa Vaddagiri wrote:
> Hi,
> 	I happened to look at a system which was exhibiting poor ping
> performance with e1000 driver (in 2.6.25) and had some questions
> regarding that. 
> ...

> Upon some investigation, I found that the interrupt count field in
> /proc/interrupts (associated with eth1) is not incrementing as fast as
> it should. Moreover eth1 interrupt line is shared with the hard disk
> interrupt (ata_piix) as below:
> 
> # cat /proc/interrupts
>  10:       2296    XT-PIC-XT        ata_piix, eth0, eth1

whats wrong with your system that you can't use acpi and/or apic?  It
would probably orthoginally solve the problem by unsharing your
interrupt.

> IRQ10 is thus being shared by both the hard disk and eth0/eth1.

bad for performance but should really work okay.
 
> Here's the strange observation I made:
> 
> When I initiate some disk activity (ex: dd if=/dev/zero
> ...

> This meant that e1000 NIC is having trouble interrupting the OS.

you're correct here, there appears to be some problem on your system
either with interrupt delivery or with the driver masking off interrupts
and leaving them disabled.
 
> Before I could jump up and say this is a hardware issue, I was told
> that Windows works just fine on the server (and as well as 2.4 kernel,
> which I couldnt verify) :(

well it might be a bios issue, but would likely be solved by using boot
option acpi=force and/or lapci (see kernel-parameters.txt

> Some more observations:
> 
> 1. I tried setting e1000 parameters (RxIntDelay=0, RxAbsIntDelay=0,
>    TxIntDelay=0, TxAbsIntDelay=0, InterruptThrottleRate=0). None of
>    them helped.

these won't help you get an interrupt delivered or re-enabled
 
> 2. When ping performance was poor, readprofile showed that system
>    is mostly idle. This confirms that OS is not getting very
>    frequenty interrupts from eth1 and hence idling.

expected, thanks for checking.

> 3. When ping performance was poor, ethtool -S eth1 showed that
>    rx_bytes was incrementing at a good pace, showing that the
>    NIC was receiving ping responses back, but not handing them over
>    to OS for further processing

also expected for an interrupt problem.
 
> 4. e1000 chipset is 82546GB
> 
> 5. e1000e driver didnt work at all (it doesnt recognize the cards).

expected, this is a PCI-X adapter.
 

> Any advice on how to fix this problem?

try the boot options first, then if that doesn't work for you, download
ethregs from e1000.sourceforge.net download area and compile/run it and
send me the output in private email.

if you have a spare moment, you can try the e1000-8.X driver from
sourceforge and let me know if it works okay, that would imply we just
need to patch the in-kernel driver to fix an already known issue.

Jesse
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ